T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering

Authors

  • Lei Wang Beijing Forestry University, China Singapore Management University, Singapore
  • Yi Hu University of Electronic Science and Technology of China, China
  • Jiabang He University of Electronic Science and Technology of China, China
  • Xing Xu University of Electronic Science and Technology of China, China
  • Ning Liu Beijing Forestry University, China
  • Hui Liu Beijing Rongda Technology Co., Ltd., China
  • Heng Tao Shen University of Electronic Science and Technology of China, China

DOI:

https://doi.org/10.1609/aaai.v38i17.29884

Keywords:

NLP: (Large) Language Models, NLP: Language Grounding & Multi-modal NLP

Abstract

Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5%. The code is publicly available at https://github.com/T-SciQ/T-SciQ.

Downloads

Published

2024-03-24

How to Cite

Wang, L., Hu, Y., He, J., Xu, X., Liu, N., Liu, H., & Shen, H. T. (2024). T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19162-19170. https://doi.org/10.1609/aaai.v38i17.29884

Issue

Section

AAAI Technical Track on Natural Language Processing II