Wei, J., Zhan, H., Lu, Y., Tu, X., Yin, B., Liu, C., & Pal, U. (2024). Image as a Language: Revisiting Scene Text Recognition via Balanced, Unified and Synchronized Vision-Language Reasoning Network. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5885–5893. https://doi.org/10.1609/aaai.v38i6.28402