Federated Vision-Language-Recommendation with Personalized Fusion
DOI:
https://doi.org/10.1609/aaai.v40i28.39503Abstract
Applying large pre-trained Vision-Language Models to recommendation is a burgeoning field, a direction we term Vision-Language-Recommendation (VLR). Bringing VLR to user-oriented on-device intelligence within a federated learning framework is a crucial step for enhancing user privacy and delivering personalized experiences. This paper introduces FedVLR, a federated VLR framework specially designed for user-specific personalized fusion of vision-language representations. At its core is a novel bi-level fusion mechanism: The server-side multi-view fusion module first generates a diverse set of pre-fused multimodal views. Subsequently, each client employs a user-specific mixture-of-expert mechanism to adaptively integrate these views based on individual user interaction history. This designed lightweight personalized fusion module provides an efficient solution to implement a federated VLR system. The effectiveness of our proposed FedVLR has been validated on seven benchmark datasets.Downloads
Published
2026-03-14
How to Cite
Li, Z., Long, G., Jiang, J., Zhang, C., & Yang, Q. (2026). Federated Vision-Language-Recommendation with Personalized Fusion. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23337–23345. https://doi.org/10.1609/aaai.v40i28.39503
Issue
Section
AAAI Technical Track on Machine Learning V