MusicRec: Multi-modal Semantic-Enhanced Identifier with Collaborative Signals for Generative Recommendation

Authors

  • Yuqiu Zhao Communication University of China
  • Lei Shi Communication University of China
  • Yan Zhong Peking University
  • Feifei Kou Beijing University of Posts and Telecommunications
  • Pengfei Zhang Anhui University Of Science & Technology
  • Jiwei Zhang Beijing University of Posts and Telecommunications
  • Mingying Xu North China University of Technology
  • Yanchao Liu Communication University of China

DOI:

https://doi.org/10.1609/aaai.v40i19.38685

Abstract

Generative recommendation as a new paradigm is influencing the current development of recommender systems. It aims to assign identifiers that capture richer semantic and collaborative information to items, and subsequently predict item identifiers via autoregressive generation using Large Language Models (LLMs). Existing approaches primarily tokenize item text into codebooks with preserved semantic IDs through RQ-VAE, or separately tokenize different modality features of items. However, existing tokenization methods face two major challenges: (1) Learning decoupled multi-modal features limits the quality of the semantic representation. (2) Ignoring collaborative signals from interaction history limits the comprehensiveness of identifiers. To address these limitations, we propose a multi-modal semantic-enhanced identifier with collaborative signals for generative recommendation, named MusicRec. In MusicRec, we propose a tokenization approach based on shared-specific modal fusion, enabling the generated identifiers to preserve semantic information more comprehensively from all modalities. In addition, we incorporate collaborative signals from user interactions to guide identifier generation, preserving collaborative patterns in the semantic representation space. Extensive experiments on three public datasets demonstrate that MusicRec achieves state-of-the-art performance compared to existing baseline methods.

Downloads

Published

2026-03-14

How to Cite

Zhao, Y., Shi, L., Zhong, Y., Kou, F., Zhang, P., Zhang, J., … Liu, Y. (2026). MusicRec: Multi-modal Semantic-Enhanced Identifier with Collaborative Signals for Generative Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16459–16467. https://doi.org/10.1609/aaai.v40i19.38685

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III