Hybrid-Domain Adaptative Representation Learning for Gaze Estimation
DOI:
https://doi.org/10.1609/aaai.v40i11.37891Abstract
Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significant performance degradation in cross-domain evaluation due to interference from gaze-irrelevant factors, such as expressions, wearables, and image quality. To alleviate this problem, we present a novel Hybrid-domain Adaptative Representation Learning (shorted by HARL) framework that exploits multi-source hybrid datasets to learn robust gaze representation. More specifically, we propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images in an unsupervised domain-adaptation manner, which hardly requires any computational or inference costs. Additionally, we analyze the effect of head-pose and design a simple yet efficient sparse graph fusion module to explore the geometric constraint between gaze direction and head-pose, leading to a dense and robust gaze representation. Extensive experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy of 5.02, 3.36, and 9.26 degrees respectively, and present competitive performances through cross-dataset evaluation.Downloads
Published
2026-03-14
How to Cite
Tan, Q., Yang, H., & Du, W. (2026). Hybrid-Domain Adaptative Representation Learning for Gaze Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 9323–9331. https://doi.org/10.1609/aaai.v40i11.37891
Issue
Section
AAAI Technical Track on Computer Vision VIII