LRM-LLaVA: Overcoming the Modality Gap of Multilingual Large Language-Vision Model for Low-Resource Languages

Junchen Li; Qing Yang; Bojian Jiang; Shaolin Zhu; Qingxuan Sun

doi:10.1609/aaai.v39i23.34623

Authors

Junchen Li Du Xiaoman Finance, Beijing, China
Qing Yang Du Xiaoman Finance, Beijing, China
Bojian Jiang Du Xiaoman Finance, Beijing, China
Shaolin Zhu Tianjin University, Tianjin, China
Qingxuan Sun Du Xiaoman Finance, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i23.34623

Abstract

Multilingual large language-vision models (LVLMs), which understand and generate both text and images across multiple languages, have achieved remarkable performance on English-centric multimodal generation tasks. However, their performance on non-English tasks has been underwhelming. One major challenge with multilingual LVLMs is the modality gap between visual inputs and multilingual textual inputs/outputs due to the lack of high-quality multilingual training data. In this paper, we propose LRM-LLaVA, a multilingual large language-vision model designed for low-resource languages to overcome the modality gap. It is composed of four components: a visual encoder, a multilingual large language model, a vision-text representation projector, and a cross-modal regularizer. Both the projector and regularizer aim at reducing the modality gap and improving multilingual performance. To train LRM-LLaVA, we employ a two-stage training strategy including pre-training and instruction fine-tuning. Meanwhile, we construct a multilingual visual question answering dataset based on English open-source datasets and adopt multiple task instructions. To evaluate the performance of LVLMs across various languages, we construct four multilingual benchmarks for 10 languages, based on English open-source benchmarks. Experimental results show that LRM-LLaVA achieves competitive performance compared to other multilingual LVLMs of similar parameters.

LRM-LLaVA: Overcoming the Modality Gap of Multilingual Large Language-Vision Model for Low-Resource Languages

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information