Xiang, Tianhang, Yirui Li, Lizhao Liu, Hongyan Zhi, Chuanshen Chen, Qing Du, and Mingkui Tan. “FAM: Fine-Grained Alignment Matters in Multimodal Embedding Learning With Large Vision-Language Models”. Proceedings of the AAAI Conference on Artificial Intelligence 40, no. 32 (March 14, 2026): 27046–27054. Accessed May 27, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/39918.