ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Authors

  • Junxian Li Shanghai Artificial Intelligence Laboratory Shanghai Jiaotong University
  • Di Zhang Shanghai Artificial Intelligence Laboratory Fudan University
  • Xunzhi Wang Shanghai Artificial Intelligence Laboratory Nankai University
  • Zeying Hao Shanghai Artificial Intelligence Laboratory University of Science and Technology of China
  • Jingdi Lei Shanghai Artificial Intelligence Laboratory
  • Qian Tan Shanghai Artificial Intelligence Laboratory University of Science and Technology of China
  • Cai Zhou Shanghai Artificial Intelligence Laboratory
  • Wei Liu Shanghai Artificial Intelligence Laboratory Shanghai Jiaotong University
  • Yaotian Yang Shanghai Artificial Intelligence Laboratory
  • Xinrui Xiong Shanghai Artificial Intelligence Laboratory
  • Weiyun Wang Shanghai Artificial Intelligence Laboratory
  • Zhe Chen Shanghai Artificial Intelligence Laboratory
  • Wenhai Wang Shanghai Artificial Intelligence Laboratory
  • Wei Li Shanghai Artificial Intelligence Laboratory
  • Mao Su Shanghai Artificial Intelligence Laboratory
  • Shufei Zhang Shanghai Artificial Intelligence Laboratory
  • Wanli Ouyang Shanghai Artificial Intelligence Laboratory
  • Yuqiang Li Shanghai Artificial Intelligence Laboratory
  • Dongzhan Zhou Shanghai Artificial Intelligence Laboratory

DOI:

https://doi.org/10.1609/aaai.v39i1.32020

Abstract

Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper, we introduce ChemVLM, an open-source chemical multimodal large language model specifically designed for chemical applications. ChemVLM is trained on a carefully curated bilingual multimodal dataset that enhances its ability to understand both textual and visual chemical information, including molecular structures, reactions, and chemistry examination questions. We develop three datasets for comprehensive evaluation, tailored to Chemical Optical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), and Multimodal Molecule Understanding tasks. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks. Experimental results demonstrate that ChemVLM achieves competitive performance across all evaluated tasks.

Downloads

Published

2025-04-11

How to Cite

Li, J., Zhang, D., Wang, X., Hao, Z., Lei, J., Tan, Q., … Zhou, D. (2025). ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area. Proceedings of the AAAI Conference on Artificial Intelligence, 39(1), 415–423. https://doi.org/10.1609/aaai.v39i1.32020

Issue

Section

AAAI Technical Track on Application Domains