Curriculum Conditioned Diffusion for Multimodal Recommendation

Authors

  • Yimeng Yang School of Software, Shandong University, Jinan, China
  • Haokai Ma School of Software, Shandong University, Jinan, China
  • Lei Meng School of Software, Shandong University, Jinan, China Shandong Research Institute of Industrial Technology, Jinan, China
  • Shuo Xu School of Software, Shandong University, Jinan, China
  • Ruobing Xie Tencent, China
  • Xiangxu Meng School of Software, Shandong University, Jinan, China

DOI:

https://doi.org/10.1609/aaai.v39i12.33422

Abstract

Multimodal recommendation (MMRec) aims to integrate multimodal information of items to address the inherent data sparsity issue in collaborative-based recommendation. Traditional MMRec methods typically capture the structure-level item representations from the observed user behaviors within the multimodal graph, overlooking the potential impact of negative instances for personalized preference understanding. In light of the outstanding generative ability and step-by-step inference characteristic of Diffusion Models (DMs), we propose a Curriculum Conditioned Diffusion framework for Multimodal Recommendation (CCDRec), which precisely excavates the modality-aware distribution-level correlation among multi-modalities and elegantly integrates the reverse phase of DMs into negative sampling to highlight the most suitable instances in a curricular manner. Specifically, CCDRec proposes the Diffusion-controlled Multimodal Aligning module (DMA) to align multimodal knowledge with collaborative signals by capturing the fine-grained relationships among multi-modalities in the probabilistic distribution space. Furthermore, CCDRec designs the Negative-sensitive Diffusive Inferring module (NDI) to progressively synthesize the negative sample pool with diverse hardness to support the following knowledge-aware negative sampling. To gradually ramp up the training complexity, CCDRec further introduces a Curricular Negative Sampler (CNS) to tally the curriculum learning paradigm with the reverse phase of DMA, thereby adaptively sampling the gold-standard negative instances to enhance optimization. Extensive experiments on three datasets with four diverse backbones demonstrate the effectiveness and robustness of our CCDRec. The visualization analyses also clarify the underlying mechanism of our DMA in multimodal representation alignment and CNS in curricular negative discovery. The code and the corresponding dataset will be uploaded in the Appendix.

Downloads

Published

2025-04-11

How to Cite

Yang, Y., Ma, H., Meng, L., Xu, S., Xie, R., & Meng, X. (2025). Curriculum Conditioned Diffusion for Multimodal Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(12), 13035–13043. https://doi.org/10.1609/aaai.v39i12.33422

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management II