[1]
Liu, Z. et al. 2024. Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning. Proceedings of the AAAI Conference on Artificial Intelligence. 38, 4 (Mar. 2024), 3864–3872. DOI:https://doi.org/10.1609/aaai.v38i4.28178.