Hybrid-Domain Adaptative Representation Learning for Gaze Estimation

Qida Tan; Hongyu Yang; Wenchao Du

doi:10.1609/aaai.v40i11.37891

Authors

Qida Tan Sichuan University, Chengdu, China
Hongyu Yang Sichuan University, Chengdu, China
Wenchao Du Sichuan University, Chengdu, China

DOI:

https://doi.org/10.1609/aaai.v40i11.37891

Abstract

Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significant performance degradation in cross-domain evaluation due to interference from gaze-irrelevant factors, such as expressions, wearables, and image quality. To alleviate this problem, we present a novel Hybrid-domain Adaptative Representation Learning (shorted by HARL) framework that exploits multi-source hybrid datasets to learn robust gaze representation. More specifically, we propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images in an unsupervised domain-adaptation manner, which hardly requires any computational or inference costs. Additionally, we analyze the effect of head-pose and design a simple yet efficient sparse graph fusion module to explore the geometric constraint between gaze direction and head-pose, leading to a dense and robust gaze representation. Extensive experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy of 5.02, 3.36, and 9.26 degrees respectively, and present competitive performances through cross-dataset evaluation.

Hybrid-Domain Adaptative Representation Learning for Gaze Estimation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information