Qiu, Longtian, Shan Ning, and Xuming He. 2024. “Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training”. Proceedings of the AAAI Conference on Artificial Intelligence 38 (5):4605-13. https://doi.org/10.1609/aaai.v38i5.28260.