Interest-driven Deep Multi-modal Clustering

Authors

  • Guoliang Zou Zhengzhou University
  • Tongji Chen Zhengzhou University
  • Sijia Li Zhengzhou University
  • Jin Qin Zhengzhou University
  • Yangdong Ye Zhengzhou University
  • Shizhe Hu Zhengzhou University

DOI:

https://doi.org/10.1609/aaai.v40i34.40167

Abstract

Deep multi-modal clustering fully learns semantically consistent and discriminative cluster representations between multiple modalities in an unlabeled manner. However, existing methods treat all samples equally, ignoring varying sample quality, which limits clustering performance. Inspired by the concept of interest in the recommendation system, we propose a novel interest-driven deep multi-modal clustering (IDMC) framework. It designs a new paradigm to quantify the importance of each sample base on the attention it receives from other samples, which called interest value. This value jointly captures the local geometric structure through the Euclidean distance in feature space and the consistency of pseudo-labels. Then, we design a novel adaptive Bayesian fusion mechanism to dynamically balance the prior features and self-supervisory signals to ensure confidence-based sample importance estimation. Furthermore, we introduce a median normalization constraint and a label consistency constraint to further refine the construction of the interest value. By embedding this interest-guided value into representation learning and cluster optimization, IDMC focuses on the samples with the most information and the most stable semantics, thereby enhancing the performance of multi-modal representation learning. Extensive experiments verify that IDMC is superior to existing state-of-the-art methods in multiple evaluation metrics.

Published

2026-03-14

How to Cite

Zou, G., Chen, T., Li, S., Qin, J., Ye, Y., & Hu, S. (2026). Interest-driven Deep Multi-modal Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 29277–29285. https://doi.org/10.1609/aaai.v40i34.40167

Issue

Section

AAAI Technical Track on Machine Learning XI