Liu, Z., Liu, J., & Ma, F. (2024). Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3864–3872. https://doi.org/10.1609/aaai.v38i4.28178