vMFCoOp: Towards Equilibrium on a Unified Hyperspherical Manifold for Prompting Biomedical VLMs

Authors

  • Minye Shao Durham University
  • Sihan Guo Durham University
  • Xinrun Li Durham University
  • Xingyu Miao Durham University
  • Haoran Duan Tsinghua University
  • Yang Long Durham University

DOI:

https://doi.org/10.1609/aaai.v40i11.37839

Abstract

Recent advances in context optimization (CoOp) guided by large language model (LLM)–distilled medical semantic priors offer a scalable alternative to manual prompt engineering and full fine-tuning for adapting biomedical CLIP-based vision-language models (VLMs). However, prompt learning in this context is challenged by semantic misalignment between LLMs and CLIP variants due to divergent training corpora and model architectures; it further lacks scalability across continuously evolving families of foundation models. More critically, pairwise multimodal alignment via conventional Euclidean-space optimization lacks the capacity to model unified representations or apply localized geometric constraints, which tends to amplify modality gaps in complex biomedical imaging and destabilize few-shot adaptation. To address these challenges, we propose vMFCoOp, a framework that inversely estimates von Mises–Fisher (vMF) distributions on a shared Hyperspherical Manifold, aligning semantic biases between arbitrary LLMs and CLIP backbones via Unified Semantic Anchors to achieve robust biomedical prompting and superior few-shot classification. Grounded in three complementary constraints, vMFCoOp demonstrates consistent improvements across 14 medical datasets, 12 medical imaging modalities, and 13 anatomical regions, outperforming state-of-the-art methods in accuracy, generalization, and clinical applicability.

Published

2026-03-14

How to Cite

Shao, M., Guo, S., Li, X., Miao, X., Duan, H., & Long, Y. (2026). vMFCoOp: Towards Equilibrium on a Unified Hyperspherical Manifold for Prompting Biomedical VLMs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 8851–8859. https://doi.org/10.1609/aaai.v40i11.37839

Issue

Section

AAAI Technical Track on Computer Vision VIII