OUS: Bridging Scene Context and Facial Features to Overcome the Rigid Cognitive Problem

Authors

  • Xinji Mai Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Haoran Wang Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Zeng Tao Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Junxiong Lin Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Shaoqi Yan School of Electrical and Electronic Engineering, Shanghai Insittute of Techonlogy
  • Yan Wang Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Jiawen Yu Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Xuan Tong Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Yating Li Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
  • Wenqiang Zhang Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Engineering Research Center of AI & Robotics, Ministry of Education, Fudan University

DOI:

https://doi.org/10.1609/aaai.v39i6.32647

Abstract

Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered out, focusing solely on facial information. We refer to this as the Rigid Cognitive Problem. The Rigid Cognitive Problem can lead to discrepancies between the cognition of annotators and models in some samples. To align more closely with the human cognitive paradigm of emotions, we propose an Overall Understanding of the Scene DFER method (OUS). OUS effectively integrates scene and facial features, combining scene-specific emotional knowledge for DFER. Extensive experiments on the two largest datasets in the DFER field, DFEW and FERV39k, demonstrate that OUS significantly outperforms existing methods. By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.

Downloads

Published

2025-04-11

How to Cite

Mai, X., Wang, H., Tao, Z., Lin, J., Yan, S., Wang, Y., … Zhang, W. (2025). OUS: Bridging Scene Context and Facial Features to Overcome the Rigid Cognitive Problem. Proceedings of the AAAI Conference on Artificial Intelligence, 39(6), 6054–6062. https://doi.org/10.1609/aaai.v39i6.32647

Issue

Section

AAAI Technical Track on Computer Vision V