Federated Vision-Language-Recommendation with Personalized Fusion

Zhiwei Li; Guodong Long; Jing Jiang; Chengqi Zhang; Qiang Yang

doi:10.1609/aaai.v40i28.39503

Authors

Zhiwei Li University of Technology Sydney
Guodong Long University of Technology Sydney
Jing Jiang University of Technology Sydney
Chengqi Zhang Hong Kong Polytechnic University
Qiang Yang Hong Kong Polytechnic University

DOI:

https://doi.org/10.1609/aaai.v40i28.39503

Abstract

Applying large pre-trained Vision-Language Models to recommendation is a burgeoning field, a direction we term Vision-Language-Recommendation (VLR). Bringing VLR to user-oriented on-device intelligence within a federated learning framework is a crucial step for enhancing user privacy and delivering personalized experiences. This paper introduces FedVLR, a federated VLR framework specially designed for user-specific personalized fusion of vision-language representations. At its core is a novel bi-level fusion mechanism: The server-side multi-view fusion module first generates a diverse set of pre-fused multimodal views. Subsequently, each client employs a user-specific mixture-of-expert mechanism to adaptively integrate these views based on individual user interaction history. This designed lightweight personalized fusion module provides an efficient solution to implement a federated VLR system. The effectiveness of our proposed FedVLR has been validated on seven benchmark datasets.

Federated Vision-Language-Recommendation with Personalized Fusion

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information