PromptEmo: Learning Emotion with Bilateral Textual Prompts in Multi-Domain Open-set Scenarios

Authors

  • Xinyi Zeng Sichuan University
  • Yuxiang Yang Sichuan University
  • Pinxian Zeng Sichuan University
  • Wenxia Yin Sichuan University
  • Bo Liu The Hong Kong Polytechnic University
  • Xi Wu Chengdu University of Information Technology
  • Yan Wang Sichuan University

DOI:

https://doi.org/10.1609/aaai.v40i15.38224

Abstract

Facial Expression Recognition (FER) is crucial to human-computer interaction. Existing cross-domain FER (CD-FER) methods mainly focus on single-source closed-set scenarios, transferring knowledge from a single source domain to a target domain with identical class sets. However, CD-FER faces two real-world challenges: 1) the need to leverage information from multiple sources, leading to multi-domain shift, and 2) the necessity to recognize unseen target classes, resulting in class shift. These issues give rise to a novel and challenging task, which we define as Multi-domain Open-set FER (MO-FER). In this paper, we propose PromptEmo, a novel CLIP-based framework that leverages bilateral textual prompts to address both shifts in the MO-FER task. Leveraging the generalizability of LLM, PromptEmo constructs trainable positive prompts with LLM-generated emotion descriptions for seen classes, as well as template-derived negative prompts to enhance the reasoning for unseen classes. Then, we introduce a modal-task optimization paradigm organized from two perspectives: textual semantics and visual domains, yielding Intra-modal Space-specific Optimization (ISO) and Cross-modal Emotion-aware Interaction (CEI) strategies. ISO refines the CLIP-based textual space to ensure semantic separation between bilateral prompts and improves the latent visual space by promoting inter-domain alignment. Founded on ISO, CEI facilitates effective vision-language interactions, resulting in four joint loss terms that improve emotion recognition by shaping a domain-invariant, discriminative feature space. PromptEmo surpasses the current SOTA method by 7.7% AUC on unseen classes across four FER datasets, serving as a strong baseline for the MO-FER task.

Downloads

Published

2026-03-14

How to Cite

Zeng, X., Yang, Y., Zeng, P., Yin, W., Liu, B., Wu, X., & Wang, Y. (2026). PromptEmo: Learning Emotion with Bilateral Textual Prompts in Multi-Domain Open-set Scenarios. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12322–12330. https://doi.org/10.1609/aaai.v40i15.38224

Issue

Section

AAAI Technical Track on Computer Vision XII