Explainable Depression Assessment from Face Videos by Weakly Supervised Learning

Authors

  • Rongfan Liao University of Leicester HBUG Lab, University of Exeter
  • Xiangyu Kong HBUG Lab, University of Exeter
  • Shiqing Tang HBUG Lab, University of Exeter Shanghai University of Science and Technology
  • Lang He Xi’an University of Posts and Telecommunications
  • Changzeng Fu Northeastern University
  • Weicheng Xie Shenzhen University
  • Xiaofeng Liu Hohai University
  • Lu Liu HBUG Lab, University of Exeter
  • Siyang Song HBUG Lab, University of Exeter

DOI:

https://doi.org/10.1609/aaai.v40i3.37173

Abstract

Existing video-based automatic depression assessment (ADA) approaches frequently achieve video-level depression assessment by aggregating features or predictions of individual frames or equal-length segments within the given video. While their performances have been largely enhanced by recent advanced deep learning models, they typically fail to explicitly consider the varied importance of depression-related behavioural cues across different video segments, i.e., segments within one video may contain behaviours reflecting varying levels of depression. Underestimating segment-level variations can obscure the detection of facial behaviour cues associated with depression, thereby undermining the accuracy and interpretability of video-based depression detection systems. In this paper, we propose a novel video-based ADA approach that specifically identifies and differentiates video segments that exhibit depression-related facial behaviours across varying temporal durations, providing clear insights into how each segment contributes to the video-level depression prediction. To achieve this, a novel weakly supervised strategy is proposed to compare segment-level behaviours with video-level depression label, enabling the model to assign depression-relevant scores to multiple temporal scale video segments and attend selectively to those most indicative of depressive states. Extensive experiments on the AVEC 2013 and AVEC 2014 face video depression datasets demonstrate the effectiveness of our approach.

Downloads

Published

2026-03-14

How to Cite

Liao, R., Kong, X., Tang, S., He, L., Fu, C., Xie, W., … Song, S. (2026). Explainable Depression Assessment from Face Videos by Weakly Supervised Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(3), 1928–1936. https://doi.org/10.1609/aaai.v40i3.37173

Issue

Section

AAAI Technical Track on Cognitive Modeling & Cognitive Systems