Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition

Authors

  • Wen-Jue He Harbin Institute of Technology, Shenzhen
  • Xiaofeng Zhu Hainan University
  • Zheng Zhang Harbin Institute of Technology, Shenzhen Shenzhen Loop Area Institute

DOI:

https://doi.org/10.1609/aaai.v40i21.38800

Abstract

Incomplete multi-modal emotion recognition (IMER) aims at understanding human intentions and sentiments by comprehensively exploring the partially-observed multi-source data. Although the multi-modal data is expected to provide more abundant information, the performance gap and modality under-optimization problem hinder effective multi-modal learning in practice, and are exacerbated in the confrontation of the missing data. To address this issue, we devise a novel Cross-modal Prompting (ComP) method, which emphasizes coherent information by enhancing modality-specific features and improves the overall recognition accuracy by boosting each modality's performance. Specifically, a progressive prompt generation module with a dynamic gradient modulator is proposed to produce concise and consistent modality semantic cues. Meanwhile, cross-modal knowledge propagation selectively amplifies the consistent information in modality features with the delivered prompts to enhance the discrimination of the modality-specific output. Additionally, a coordinator is employed to dynamically re-weight the modality outputs as a complement to the balance strategy to improve the model's efficacy. Extensive experiments on 4 datasets with 7 SOTA methods under different missing rates validate the effectiveness of our proposed method.

Published

2026-03-14

How to Cite

He, W.-J., Zhu, X., & Zhang, Z. (2026). Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 40(21), 17463–17471. https://doi.org/10.1609/aaai.v40i21.38800

Issue

Section

AAAI Technical Track on Humans and AI