(1)

Qiu, L.; Ning, S.; He, X. Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training. AAAI 2024, 38, 4605-4613.