Hu, W., Xu, Y., Li, Y., Li, W., Chen, Z., & Tu, Z. (2024). BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2256–2264. https://doi.org/10.1609/aaai.v38i3.27999