[1]

Zhu, Y. et al. 2023. Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA. Proceedings of the AAAI Conference on Artificial Intelligence. 37, 9 (Jun. 2023), 11479–11487. DOI:https://doi.org/10.1609/aaai.v37i9.26357.