[1]
Meng, G. et al. 2025. EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence. 39, 6 (Apr. 2025), 6126–6134. DOI:https://doi.org/10.1609/aaai.v39i6.32655.