MusicRec: Multi-modal Semantic-Enhanced Identifier with Collaborative Signals for Generative Recommendation
DOI:
https://doi.org/10.1609/aaai.v40i19.38685Abstract
Generative recommendation as a new paradigm is influencing the current development of recommender systems. It aims to assign identifiers that capture richer semantic and collaborative information to items, and subsequently predict item identifiers via autoregressive generation using Large Language Models (LLMs). Existing approaches primarily tokenize item text into codebooks with preserved semantic IDs through RQ-VAE, or separately tokenize different modality features of items. However, existing tokenization methods face two major challenges: (1) Learning decoupled multi-modal features limits the quality of the semantic representation. (2) Ignoring collaborative signals from interaction history limits the comprehensiveness of identifiers. To address these limitations, we propose a multi-modal semantic-enhanced identifier with collaborative signals for generative recommendation, named MusicRec. In MusicRec, we propose a tokenization approach based on shared-specific modal fusion, enabling the generated identifiers to preserve semantic information more comprehensively from all modalities. In addition, we incorporate collaborative signals from user interactions to guide identifier generation, preserving collaborative patterns in the semantic representation space. Extensive experiments on three public datasets demonstrate that MusicRec achieves state-of-the-art performance compared to existing baseline methods.Published
2026-03-14
How to Cite
Zhao, Y., Shi, L., Zhong, Y., Kou, F., Zhang, P., Zhang, J., … Liu, Y. (2026). MusicRec: Multi-modal Semantic-Enhanced Identifier with Collaborative Signals for Generative Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16459–16467. https://doi.org/10.1609/aaai.v40i19.38685
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management III