[1]

Hu, W. et al. 2024. BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions. Proceedings of the AAAI Conference on Artificial Intelligence. 38, 3 (Mar. 2024), 2256–2264. DOI:https://doi.org/10.1609/aaai.v38i3.27999.