Seeing Beyond Noise: Joint Graph Structure Evaluation and Denoising for Multimodal Recommendation

Yuxin Qi; Quan Zhang; Xi Lin; Xiu Su; Jiani Zhu; Jingyu Wang; Jianhua Li

doi:10.1609/aaai.v39i12.33358

Authors

Yuxin Qi School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
Quan Zhang Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Xi Lin School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
Xiu Su Big Data Institute, Central South University, Changsha, China
Jiani Zhu School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
Jingyu Wang Faculty of Information Science and Engineering, Ocean University of China, Qingdao, China
Jianhua Li School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v39i12.33358

Abstract

Multimodal Recommendation Systems (MRSs) boost traditional user-item interaction-based methods by incorporating multimodal information. However, existing methods ignore the inherent noise brought by (1) noisy semantic priors in multimodal content, and (2) noisy user interactions in history records, therefore diminishing model performance. To fill this gap, we propose to denoise MRSs by jointly EValuating structure Effectiveness and mitigating Noisy links (EVEN). Firstly, for semantic prior noise in multimodal content, EVEN builds item homogeneous consistency and denoises it by evaluating behavior-driven confidence. Secondly, for noise in user interactions, EVEN updates user feedback by denoising observed interactions following implicit contribution evaluation of high-order representations. Thirdly, EVEN performs cross-modal alignment through self-guided structure learning, reinforcing task-specific inter-modal dependency modeling and cross-modal fusion. Through extensive experiments on three widely-used datasets, EVEN achieves an average improvement of 8.95% and 5.90% in recommendation accuracy compared with LGMRec and FREEDOM, respectively, without extending the total training time.

Seeing Beyond Noise: Joint Graph Structure Evaluation and Denoising for Multimodal Recommendation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information