OUS: Bridging Scene Context and Facial Features to Overcome the Rigid Cognitive Problem

Xinji Mai; Haoran Wang; Zeng Tao; Junxiong Lin; Shaoqi Yan; Yan Wang; Jiawen Yu; Xuan Tong; Yating Li; Wenqiang Zhang

doi:10.1609/aaai.v39i6.32647

Authors

Xinji Mai Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Haoran Wang Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Zeng Tao Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Junxiong Lin Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Shaoqi Yan School of Electrical and Electronic Engineering, Shanghai Insittute of Techonlogy
Yan Wang Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Jiawen Yu Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Xuan Tong Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Yating Li Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University
Wenqiang Zhang Shanghai Engineering Research Center of AI & Robotics, Academy for Engineering & Technology, Fudan University Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University Engineering Research Center of AI & Robotics, Ministry of Education, Fudan University

DOI:

https://doi.org/10.1609/aaai.v39i6.32647

Abstract

Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered out, focusing solely on facial information. We refer to this as the Rigid Cognitive Problem. The Rigid Cognitive Problem can lead to discrepancies between the cognition of annotators and models in some samples. To align more closely with the human cognitive paradigm of emotions, we propose an Overall Understanding of the Scene DFER method (OUS). OUS effectively integrates scene and facial features, combining scene-specific emotional knowledge for DFER. Extensive experiments on the two largest datasets in the DFER field, DFEW and FERV39k, demonstrate that OUS significantly outperforms existing methods. By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.

OUS: Bridging Scene Context and Facial Features to Overcome the Rigid Cognitive Problem

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information