DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos
DOI:
https://doi.org/10.1609/aaai.v40i5.37411Abstract
Multi-person eyeblink detection in untrimmed in-the-wild videos is a recently emerged and challenging task. Due to its significant spatio-temporal fine-grained characteristics compared to general actions, we empirically find that general action detectors, though effective in general domains, struggle with this task (i.e., Blink-AP < 2%). Specialized eyeblink detection methods alleviate it through fine-grained spatio-temporal operations. SOTA method proposes a unified model combining instance-aware face localization and eyeblink detection through joint multi-task learning and feature sharing. While effective, it exhibits two critical limitations that may contribute to its unsatisfactory performance (i.e., Blink-AP=10.11%): (1) Face localization and eyeblink detection require distinct spatio-temporal feature granularities, making joint modeling in a unified feature space suboptimal. (2) Eyeblink task training could be largely affected by unstable face-eye feature learning under the joint training paradigm. To address this, we propose DeFB, a decomposed feature learning paradigm with favorable effectiveness and efficiency: (1) We model faces and eyes in granularity-specific feature spaces, which enhances fine-grained perception while reducing computational costs compared to a unified feature space. (2) To mitigate face-eye feature learning instability, we adopt an asynchronous learning mechanism where eye feature learning refines well-trained coarse face features, with shared queries acting as a bridge between stages to retain the efficient feature sharing of existing unified models. Compared with SOTA method, DeFB doubles the performance (Blink-AP: 24.65% v.s. 10.11%) while boosting efficiency by nearly 35%. DeFB can also be integrated as a plug-in to substantially augment the eyeblink detection capabilities of general action detectors.Downloads
Published
2026-03-14
How to Cite
Gan, J., Zeng, W., Xiao, Y., Zhang, X., Zheng, C., Zhao, R., Wang, R., Du, M., & Cao, Z. (2026). DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 4076-4084. https://doi.org/10.1609/aaai.v40i5.37411
Issue
Section
AAAI Technical Track on Computer Vision II