Hu, Wenbo, et al. “BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 3, Mar. 2024, pp. 2256-64, doi:10.1609/aaai.v38i3.27999.