Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models

Authors

  • Dexia Chen Sun Yat-sen University, Guangzhou, China Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China
  • Wentao Zhang Sun Yat-sen University, Guangzhou, China Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China
  • Qianjie Zhu Guangxi University, Nanning, China
  • Ping Hu Xinjiang University, Urumqi, China
  • Weibing Li Sun Yat-sen University, Guangzhou, China
  • Tong Zhang Peng Cheng Laboratory, Shenzhen, China
  • Ruixuan Wang Sun Yat-sen University, Guangzhou, China Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China Peng Cheng Laboratory, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v40i24.39086

Abstract

Vision-language models (VLMs) pre-trained on natural image and language data, such as CLIP, have exhibited significant potential in few-shot image recognition tasks, leading to development of various efficient transfer learning methods. These methods exploit inherent pre-learned knowledge in VLMs and have achieved strong performance on standard image datasets. However, their effectiveness is often limited when confronted with cross-domain tasks where imaging domains differ from natural images. To address this limitation, we propose Consistency-guided Multi-view Collaborative Optimization (CoMuCo), a novel fine-tuning strategy for VLMs. This strategy employs two functionally complementary expert modules to extract multi-view features, while incorporating prior knowledge-based consistency constraints and information geometry-based consensus mechanisms to enhance the robustness of feature learning. Additionally, a new cross-domain few-shot benchmark is established to help comprehensively evaluate methods on imaging domains distinct from natural images. Extensive empirical evaluations on both existing and newly proposed benchmarks suggest CoMuCo consistently outperforms current methods.

Downloads

Published

2026-03-14

How to Cite

Chen, D., Zhang, W., Zhu, Q., Hu, P., Li, W., Zhang, T., & Wang, R. (2026). Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(24), 20014–20022. https://doi.org/10.1609/aaai.v40i24.39086

Issue

Section

AAAI Technical Track on Machine Learning I