ZHU, Yongxin; LIU, Zhen; LIANG, Yukang; LI, Xin; LIU, Hao; BAO, Changcun; XU, Linli. Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 37, n. 9, p. 11479–11487, 2023. DOI: 10.1609/aaai.v37i9.26357. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/26357. Acesso em: 13 may. 2026.