PLUM-Net: Prototype-Induced Label Structuring for Disentangled Multimodal Representation Network

Authors

  • Kehan Wang Hunan University
  • Huan Zhao Hunan University
  • Yong Wei Hunan University
  • Xupeng Zha Hunan University
  • Guanghui Ye Hunan University
  • Cheng Zhu Hunan University
  • Yiming Liu Hunan University
  • Zixing Zhang Hunan University

DOI:

https://doi.org/10.1609/aaai.v40i22.38928

Abstract

Existing multimodal representation learning approaches often rely on simple feature concatenation or unified transformations, which fail to effectively disentangle and leverage common and private information across different modalities in a progressive manner. Moreover, they typically lack adaptive modeling tailored to specific task requirements. To address these limitations, we propose a Prototype-Induced Label Structuring for Disentangled Multimodal Representation Network (PLUM-Net). It first employs a multilevel semantic alignment module to synchronize global and local semantics across audio, visual and textual streams. On this aligned foundation, a prototype-based single-modal label generation module derives modality-specific hard and soft-labels that subtly steer the network toward a cleaner split between shared and private cues. Guided by these labels, the task-conditioned feature bifurcator module channels information through the most beneficial common or private pathway for the given task, after which a private refinement module polishes and fuses each modality’s idiosyncratic signals. Extensive experiments show that PLUM-Net delivers strong performance on datasets such as CMU-MOSI, CMU-MOSEI and UR-FUNNY, achieving an ACC-2 of 90.3% on CMU-MOSI, representing a 2%–4% improvement over previous SOTA models.

Downloads

Published

2026-03-14

How to Cite

Wang, K., Zhao, H., Wei, Y., Zha, X., Ye, G., Zhu, C., … Zhang, Z. (2026). PLUM-Net: Prototype-Induced Label Structuring for Disentangled Multimodal Representation Network. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18611–18619. https://doi.org/10.1609/aaai.v40i22.38928

Issue

Section

AAAI Technical Track on Intelligent Robotics