Federated Cross-Modal Style-Aware Prompt Generation (Student Abstract)

Authors

  • Suraj Prasad Indian Institute Of Technology Bombay
  • Navyansh Mahla Indian Institute Of Technology Bombay
  • Sunny Gupta Indian Institute Of Technology Bombay
  • Amit Sethi Indian Institute Of Technology Bombay

DOI:

https://doi.org/10.1609/aaai.v40i48.42268

Abstract

Existing federated prompt learning methods for vision-language models like CLIP rely solely on text-based prompts and final-layer visual features, missing crucial multiscale visual details and client-specific style variations. This limits generalization across non-IID distributions and novel classes. We introduce FedCSAP (Federated Cross-Modal Style-Aware Prompt Generation), which harnesses multiscale features from CLIP's vision encoder alongside domain-aware style statistics from client data. By fusing these visual representations with textual context, FedCSAP generates adaptive, context-aware prompts that enhance robustness across seen and unseen classes. Our privacy-preserving approach operates through local training and global aggregation, effectively handling heterogeneous client distributions. Experiments on multiple image classification datasets demonstrate that FedCSAP significantly outperforms existing federated prompt learning methods in both accuracy and generalization.

Published

2026-03-14

How to Cite

Prasad, S., Mahla, N., Gupta, S., & Sethi, A. (2026). Federated Cross-Modal Style-Aware Prompt Generation (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41357–41358. https://doi.org/10.1609/aaai.v40i48.42268