SGP4SR: Separated-Modality Guided User Preference Learning for Multimodal Sequential Recommendation
DOI:
https://doi.org/10.1609/aaai.v40i18.38528Abstract
With the booming development of multimodal data (e.g., image, text) on internet platforms, multimodal sequential recommendation methods continue to emerge. Most existing methods incorporate item modal features as auxiliary information, typically concatenating them to learn unified user representations. However, these methods directly use modal features for representation learning, neglecting the impact of inherent modal noise. We argue that internal-modal noise and cross-modal noise hinder the acquisition of more accurate user representations. To address this problem, we propose SGP4SR - Separated-modality Guided user Preference learning for multimodal Sequential Recommendation. Globally, the user preference modeling is carried out from a separated-modality perspective to alleviate cross-modal noise. Locally, for each individual modality, we use item relationship graphs and user interest centers, aggregated with ID embeddings, to replace direct modal features, thereby mitigating internal-modal noise. Finally, user representations from both separated-modality and multimodal perspectives participate in prediction independently. In experiments conducted on four real-world datasets, our method outperforms state-of-the-art approaches, achieving an average performance improvement of up to 8.84% over the best baseline. The comprehensive experiments further validate the superior noise tolerance and robustness of our method.Downloads
Published
2026-03-14
How to Cite
Li, C., Guo, Z., Li, G., Yang, Z., & Hong, C. (2026). SGP4SR: Separated-Modality Guided User Preference Learning for Multimodal Sequential Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(18), 15054–15062. https://doi.org/10.1609/aaai.v40i18.38528
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management II