Federated Vision-Language-Recommendation with Personalized Fusion

Authors

  • Zhiwei Li University of Technology Sydney
  • Guodong Long University of Technology Sydney
  • Jing Jiang University of Technology Sydney
  • Chengqi Zhang Hong Kong Polytechnic University
  • Qiang Yang Hong Kong Polytechnic University

DOI:

https://doi.org/10.1609/aaai.v40i28.39503

Abstract

Applying large pre-trained Vision-Language Models to recommendation is a burgeoning field, a direction we term Vision-Language-Recommendation (VLR). Bringing VLR to user-oriented on-device intelligence within a federated learning framework is a crucial step for enhancing user privacy and delivering personalized experiences. This paper introduces FedVLR, a federated VLR framework specially designed for user-specific personalized fusion of vision-language representations. At its core is a novel bi-level fusion mechanism: The server-side multi-view fusion module first generates a diverse set of pre-fused multimodal views. Subsequently, each client employs a user-specific mixture-of-expert mechanism to adaptively integrate these views based on individual user interaction history. This designed lightweight personalized fusion module provides an efficient solution to implement a federated VLR system. The effectiveness of our proposed FedVLR has been validated on seven benchmark datasets.

Published

2026-03-14

How to Cite

Li, Z., Long, G., Jiang, J., Zhang, C., & Yang, Q. (2026). Federated Vision-Language-Recommendation with Personalized Fusion. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23337–23345. https://doi.org/10.1609/aaai.v40i28.39503

Issue

Section

AAAI Technical Track on Machine Learning V