Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition

Authors

  • Haiying Xia Guangxi Normal University
  • Zhongyi Huang Guangxi Normal University
  • Yumei Tan Guangxi Normal University
  • Shuxiang Song Guangxi Normal University

DOI:

https://doi.org/10.1609/aaai.v40i3.37201

Abstract

Music emotion recognition is a key task in symbolic music understanding (SMER). Recent approaches have shown promising results by fine-tuning large-scale pre-trained models (e.g., MIDIBERT, a benchmark in symbolic music understanding) to map musical semantics to emotional labels. While these models effectively capture distributional musical semantics, they often overlook tonal structures, particularly musical modes, which play a critical role in emotional perception according to music psychology. In this paper, we investigate the representational capacity of MIDIBERT and identify its limitations in capturing mode-emotion associations. To address this issue, we propose a Mode-Guided Enhancement (MoGE) strategy that incorporates psychological insights on mode into the model. Specifically, we first conduct a mode augmentation analysis, which reveals that MIDIBERT fails to effectively encode emotion-mode correlations. Motivated by this observation, we further identify the MIDIBERT layer that shows the weakest emotion relevance and introduce a Mode-guided Feature-wise linear modulation injection (MoFi) framework to inject explicit mode features, thereby enhancing the model's capability in emotional representation and inference. Extensive experiments on the EMOPIA and VGMIDI datasets demonstrate that our mode injection strategy significantly improves SMER performance, achieving accuracies of 75.2% and 59.1%, respectively. These results validate the effectiveness of mode-guided modeling in symbolic music emotion recognition.

Downloads

Published

2026-03-14

How to Cite

Xia, H., Huang, Z., Tan, Y., & Song, S. (2026). Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 40(3), 2182–2190. https://doi.org/10.1609/aaai.v40i3.37201

Issue

Section

AAAI Technical Track on Cognitive Modeling & Cognitive Systems