Federated Cross-Modal Style-Aware Prompt Generation (Student Abstract)
DOI:
https://doi.org/10.1609/aaai.v40i48.42268Abstract
Existing federated prompt learning methods for vision-language models like CLIP rely solely on text-based prompts and final-layer visual features, missing crucial multiscale visual details and client-specific style variations. This limits generalization across non-IID distributions and novel classes. We introduce FedCSAP (Federated Cross-Modal Style-Aware Prompt Generation), which harnesses multiscale features from CLIP's vision encoder alongside domain-aware style statistics from client data. By fusing these visual representations with textual context, FedCSAP generates adaptive, context-aware prompts that enhance robustness across seen and unseen classes. Our privacy-preserving approach operates through local training and global aggregation, effectively handling heterogeneous client distributions. Experiments on multiple image classification datasets demonstrate that FedCSAP significantly outperforms existing federated prompt learning methods in both accuracy and generalization.Downloads
Published
2026-03-14
How to Cite
Prasad, S., Mahla, N., Gupta, S., & Sethi, A. (2026). Federated Cross-Modal Style-Aware Prompt Generation (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41357–41358. https://doi.org/10.1609/aaai.v40i48.42268
Issue
Section
AAAI Student Abstract and Poster Program