[1]
W. Huang, “LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation”, AAAI, vol. 40, no. 7, pp. 5131–5139, Mar. 2026.