T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering

Lei Wang; Yi Hu; Jiabang He; Xing Xu; Ning Liu; Hui Liu; Heng Tao Shen

doi:10.1609/aaai.v38i17.29884

Authors

Lei Wang Beijing Forestry University, China Singapore Management University, Singapore
Yi Hu University of Electronic Science and Technology of China, China
Jiabang He University of Electronic Science and Technology of China, China
Xing Xu University of Electronic Science and Technology of China, China
Ning Liu Beijing Forestry University, China
Hui Liu Beijing Rongda Technology Co., Ltd., China
Heng Tao Shen University of Electronic Science and Technology of China, China

DOI:

https://doi.org/10.1609/aaai.v38i17.29884

Keywords:

NLP: (Large) Language Models, NLP: Language Grounding & Multi-modal NLP

Abstract

Large Language Models (LLMs) have recently demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. They have also shown the ability to perform chain-of-thought (CoT) reasoning to solve complex problems. Recent studies have explored CoT reasoning in complex multimodal scenarios, such as the science question answering task, by fine-tuning multimodal models with high-quality human-annotated CoT rationales. However, collecting high-quality COT rationales is usually time-consuming and costly. Besides, the annotated rationales are hardly accurate due to the external essential information missed. To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. The T-SciQ approach generates high-quality CoT rationales as teaching signals and is advanced to train much smaller models to perform CoT reasoning in complex modalities. Additionally, we introduce a novel data mixing strategy to produce more effective teaching data samples for simple and complex science question answer problems. Extensive experimental results show that our T-SciQ method achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%. Moreover, our approach outperforms the most powerful fine-tuned baseline by 4.5%. The code is publicly available at https://github.com/T-SciQ/T-SciQ.

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription