(1)
Zhu, Y.; Liu, Z.; Liang, Y.; Li, X.; Liu, H.; Bao, C.; Xu, L. Locate Then Generate: Bridging Vision and Language With Bounding Box for Scene-Text VQA. AAAI 2023, 37, 11479-11487.