Zhu Y, Liu Z, Liang Y, Li X, Liu H, Bao C, et al. Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA. AAAI [Internet]. 2023 Jun. 26 [cited 2026 May 13];37(9):11479-87. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/26357