Overcoming Heterogeneous Data in Federated Medical Vision-Language Pre-training: A Triple-Embedding Model Selector Approach

Authors

  • Aowen Wang Zhejiang University
  • Zhiwang Zhang NingboTech University
  • Dongang Wang University of Sydney
  • Fanyi Wang Zhejiang University
  • Haotian Hu Zhejiang Leapmotor Technology Co., Ltd
  • Jinyang Guo Beihang University
  • Yipeng Zhou Macquarie University
  • Chaoyi Pang NingboTech University
  • Shiting Wen NingboTech University

DOI:

https://doi.org/10.1609/aaai.v39i7.32807

Abstract

The scarcity data of medical field brings the collaborative training in medical vision-language pre-training (VLP) cross different clients. Therefore, the collaborative training in medical VLP faces two challenges: First, the medical data requires privacy, thus can not directly shared across different clients. Second, medical data distribution across institutes is typically heterogeneous, hindering local model alignment and representation capabilities. To simultaneously overcome these two challenges, we propose the framework called personalized model selector with fused multimodal information (PMS-FM). The contribution of PMS-FM is two-fold: 1) PMS-FM uses embeddings to represent information in different formats, allowing for the fusion of multimodal data. 2) PMS-FM adapts to personalized data distributions by training multiple models. A model selector then identifies and selects the best-performing model for each individual client. Extensive experiments with multiple real-world medical datasets demonstrate the superb performance of PMS-FM over existing federated learning methods on different zero-shot classification tasks.

Downloads

Published

2025-04-11

How to Cite

Wang, A., Zhang, Z., Wang, D., Wang, F., Hu, H., Guo, J., Zhou, Y., Pang, C., & Wen, S. (2025). Overcoming Heterogeneous Data in Federated Medical Vision-Language Pre-training: A Triple-Embedding Model Selector Approach. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7500-7508. https://doi.org/10.1609/aaai.v39i7.32807

Issue

Section

AAAI Technical Track on Computer Vision VI