Union Is Strength! Unite the Power of LLMs and MLLMs for Chart Question Answering

Authors

  • Jiapeng Liu Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
  • Liang Li Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
  • Shihao Rao Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
  • Xiyan Gao Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
  • Weixin Guan Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
  • Bing Li Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
  • Can Ma Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i5.32584

Abstract

Chart Question Answering (CQA) requires models to perform chart perception and reasoning. Recent studies driven by Large Language Models (LLMs) have dominated CQA. These include employing more cognitively capable LLMs for indirectly reasoning over transformed charts, i.e., tables, and directly perceiving charts utilizing Multimodal Large Language Models (MLLMs) with a wider perceptual range. Yet, they often encounter bottlenecks due to the limitation of the receptive field of LLMs and the fragility of the complex reasoning of some MLLMs. To unite the strengths of LLMs and MLLMs to complement each other's limitations, we propose Synergy, a framework that unites the power of both models for CQA. Synergy first unites the chart with a table as the augmented perceptual signal. Next, it unites LLMs and MLLMs, scheduling the former to decompose a question into subquestions and the latter to answer these by perceiving the chart. Lastly, it operates LLMs to summarize the subquestion-answer pairs to refine the final answer. Extensive experimental results on popular CharQA and PlotQA benchmarks reveal that, with the power of union, Synergy outperforms strong competitors and achieves superior boosts over naive MLLMs by uniting them with a smaller LLM.

Downloads

Published

2025-04-11

How to Cite

Liu, J., Li, L., Rao, S., Gao, X., Guan, W., Li, B., & Ma, C. (2025). Union Is Strength! Unite the Power of LLMs and MLLMs for Chart Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 5487-5495. https://doi.org/10.1609/aaai.v39i5.32584

Issue

Section

AAAI Technical Track on Computer Vision IV