PromptEmo: Learning Emotion with Bilateral Textual Prompts in Multi-Domain Open-set Scenarios

Xinyi Zeng; Yuxiang Yang; Pinxian Zeng; Wenxia Yin; Bo Liu; Xi Wu; Yan Wang

doi:10.1609/aaai.v40i15.38224

Authors

Xinyi Zeng Sichuan University
Yuxiang Yang Sichuan University
Pinxian Zeng Sichuan University
Wenxia Yin Sichuan University
Bo Liu The Hong Kong Polytechnic University
Xi Wu Chengdu University of Information Technology
Yan Wang Sichuan University

DOI:

https://doi.org/10.1609/aaai.v40i15.38224

Abstract

Facial Expression Recognition (FER) is crucial to human-computer interaction. Existing cross-domain FER (CD-FER) methods mainly focus on single-source closed-set scenarios, transferring knowledge from a single source domain to a target domain with identical class sets. However, CD-FER faces two real-world challenges: 1) the need to leverage information from multiple sources, leading to multi-domain shift, and 2) the necessity to recognize unseen target classes, resulting in class shift. These issues give rise to a novel and challenging task, which we define as Multi-domain Open-set FER (MO-FER). In this paper, we propose PromptEmo, a novel CLIP-based framework that leverages bilateral textual prompts to address both shifts in the MO-FER task. Leveraging the generalizability of LLM, PromptEmo constructs trainable positive prompts with LLM-generated emotion descriptions for seen classes, as well as template-derived negative prompts to enhance the reasoning for unseen classes. Then, we introduce a modal-task optimization paradigm organized from two perspectives: textual semantics and visual domains, yielding Intra-modal Space-specific Optimization (ISO) and Cross-modal Emotion-aware Interaction (CEI) strategies. ISO refines the CLIP-based textual space to ensure semantic separation between bilateral prompts and improves the latent visual space by promoting inter-domain alignment. Founded on ISO, CEI facilitates effective vision-language interactions, resulting in four joint loss terms that improve emotion recognition by shaping a domain-invariant, discriminative feature space. PromptEmo surpasses the current SOTA method by 7.7% AUC on unseen classes across four FER datasets, serving as a strong baseline for the MO-FER task.

PromptEmo: Learning Emotion with Bilateral Textual Prompts in Multi-Domain Open-set Scenarios

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information