Dep-MAP: A Multi-level Alignment Framework with Semantic Prototypes for Video-based Automatic Depression Assessment
DOI:
https://doi.org/10.1609/aaai.v40i3.37192Abstract
Spatiotemporal analysis of facial behavior is a crucial method for evaluating the mental state of depression patients. However, in practice, depressed patients often display facial behaviors similar to healthy individuals due to masking tendencies. Additionally, facial expressions among depressed patients are also different, increasing the difficulty of assessment. To address this, we propose a video-based automatic depression assessment model Dep-MAP for complex facial behaviors of depression patients. Dep-MAP adopts a dual-branch architecture to extract visual features of facial behavior and capture corresponding emotional semantic features. Specifically, the extracted deep semantic features are clustered, resulting in semantically distinct prototype sets, where each severity group learns a set of discriminative facial behavior prototype representations, to suppress inter-class semantic confusion. Subsequently, we propose a semantic prototype-supervised contrastive learning method, which aligns latent semantics between shallow and deep features, realizing emotional semantic guidance and self-knowledge distillation for the visual feature branch, effectively suppressing intra-class difference. Then, we integrate key depression cues across multiple spatiotemporal scales via a multi-scale weighted fusion strategy, achieving automatic depression assessment. Experimental results demonstrate that Dep-MAP effectively identifies potential key frames in temporal sequences, and aggregates key frame representations with semantic consistency, achieving significantly superior state-of-the-art results on the AVEC2013 and AVEC2014 public datasets.Published
2026-03-14
How to Cite
Wang, H., Ye, J., & Wang, Q. (2026). Dep-MAP: A Multi-level Alignment Framework with Semantic Prototypes for Video-based Automatic Depression Assessment. Proceedings of the AAAI Conference on Artificial Intelligence, 40(3), 2101–2109. https://doi.org/10.1609/aaai.v40i3.37192
Issue
Section
AAAI Technical Track on Cognitive Modeling & Cognitive Systems