Feng, Steven Y., Kevin Lu, Zhuofu Tao, Malihe Alikhani, Teruko Mitamura, Eduard Hovy, and Varun Gangal. 2022. “Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models”. Proceedings of the AAAI Conference on Artificial Intelligence 36 (10):10618-26. https://doi.org/10.1609/aaai.v36i10.21306.