Hybrid-Domain Adaptative Representation Learning for Gaze Estimation

Authors

  • Qida Tan Sichuan University, Chengdu, China
  • Hongyu Yang Sichuan University, Chengdu, China
  • Wenchao Du Sichuan University, Chengdu, China

DOI:

https://doi.org/10.1609/aaai.v40i11.37891

Abstract

Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significant performance degradation in cross-domain evaluation due to interference from gaze-irrelevant factors, such as expressions, wearables, and image quality. To alleviate this problem, we present a novel Hybrid-domain Adaptative Representation Learning (shorted by HARL) framework that exploits multi-source hybrid datasets to learn robust gaze representation. More specifically, we propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images in an unsupervised domain-adaptation manner, which hardly requires any computational or inference costs. Additionally, we analyze the effect of head-pose and design a simple yet efficient sparse graph fusion module to explore the geometric constraint between gaze direction and head-pose, leading to a dense and robust gaze representation. Extensive experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy of 5.02, 3.36, and 9.26 degrees respectively, and present competitive performances through cross-dataset evaluation.

Downloads

Published

2026-03-14

How to Cite

Tan, Q., Yang, H., & Du, W. (2026). Hybrid-Domain Adaptative Representation Learning for Gaze Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 9323–9331. https://doi.org/10.1609/aaai.v40i11.37891

Issue

Section

AAAI Technical Track on Computer Vision VIII