[1]

S. Y. Feng, “Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models”, AAAI, vol. 36, no. 10, pp. 10618–10626, Jun. 2022.