[1]
W. Robbins, “Towards Multimodal Vision-Language Models Generating Non-generic Text”, AAAI, vol. 36, no. 11, pp. 13138-13139, Jun. 2022.