Seeing Beyond Noise: Joint Graph Structure Evaluation and Denoising for Multimodal Recommendation

Authors

  • Yuxin Qi School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Quan Zhang Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
  • Xi Lin School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Xiu Su Big Data Institute, Central South University, Changsha, China
  • Jiani Zhu School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Jingyu Wang Faculty of Information Science and Engineering, Ocean University of China, Qingdao, China
  • Jianhua Li School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v39i12.33358

Abstract

Multimodal Recommendation Systems (MRSs) boost traditional user-item interaction-based methods by incorporating multimodal information. However, existing methods ignore the inherent noise brought by (1) noisy semantic priors in multimodal content, and (2) noisy user interactions in history records, therefore diminishing model performance. To fill this gap, we propose to denoise MRSs by jointly EValuating structure Effectiveness and mitigating Noisy links (EVEN). Firstly, for semantic prior noise in multimodal content, EVEN builds item homogeneous consistency and denoises it by evaluating behavior-driven confidence. Secondly, for noise in user interactions, EVEN updates user feedback by denoising observed interactions following implicit contribution evaluation of high-order representations. Thirdly, EVEN performs cross-modal alignment through self-guided structure learning, reinforcing task-specific inter-modal dependency modeling and cross-modal fusion. Through extensive experiments on three widely-used datasets, EVEN achieves an average improvement of 8.95% and 5.90% in recommendation accuracy compared with LGMRec and FREEDOM, respectively, without extending the total training time.

Downloads

Published

2025-04-11

How to Cite

Qi, Y., Zhang, Q., Lin, X., Su, X., Zhu, J., Wang, J., & Li, J. (2025). Seeing Beyond Noise: Joint Graph Structure Evaluation and Denoising for Multimodal Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(12), 12461–12469. https://doi.org/10.1609/aaai.v39i12.33358

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management II