[1]
A. Gupta, Y. Verma, and C. Jawahar, “Choosing Linguistics over Vision to Describe Images”, AAAI, vol. 26, no. 1, pp. 606-612, Sep. 2021.