MLLM Enriched Explainable Multiple Clustering
DOI:
https://doi.org/10.1609/aaai.v40i33.40066Abstract
Multiple clustering aims to uncover diverse latent structures within the data, enabling a more comprehensive understanding of complex datasets. However, existing approaches either heavily rely on user-supplied keywords or disregard user-interested clustering types, limiting the ability to discover the full range of explainable clusterings of interests, particularly in high-dimensional settings. Furthermore, existing methods insufficiently leverage the rich textual semantics and fall short in fully integrating multi-modal information. To address these challenges, we propose MLLM enriched Multiple Clustering (MLLMMC), a novel framework that leverages multi-modal large language model (MLLM) to explore explainable non-redundant clustering. Specifically, MLLMMC first employs MLLM to generate sample descriptions, which serve as input for LLM to perform prompt-driven reasoning and infer latent clustering types, and then merges them with user-interested types to obtain diverse and explainable clustering types. For each selected type, MLLMMC utilizes MLLM to generate sample-level textual descriptions and aligns them with corresponding visual features through a cross-attention fusion module, which produces a semantically aligned and enriched representation for the target clustering type. Extensive experiments on six benchmark datasets from diverse domains demonstrate that MLLMMC achieves diverse, explainable, and high-quality clustering outcomes, outperforming state-of-the-art multiple clustering methods with a large margin.Downloads
Published
2026-03-14
How to Cite
Zhang, S., Ren, L., Tan, Q., Domeniconi, C., Du, W., Wang, J., & Yu, G. (2026). MLLM Enriched Explainable Multiple Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 40(33), 28373–28381. https://doi.org/10.1609/aaai.v40i33.40066
Issue
Section
AAAI Technical Track on Machine Learning X