KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Authors

  • Debjyoti Mondal Samsung R&D Institute India - Bangalore
  • Suraj Modi Samsung R&D Institute India - Bangalore
  • Subhadarshi Panda Samsung R&D Institute India - Bangalore
  • Rituraj Singh Samsung R&D Institute India - Bangalore
  • Godawari Sudhakar Rao Samsung R&D Institute India - Bangalore

DOI:

https://doi.org/10.1609/aaai.v38i17.29844

Keywords:

NLP: Language Grounding & Multi-modal NLP, CV: Language and Vision, CV: Multi-modal Vision, KRR: Common-Sense Reasoning, ML: Graph-based Machine Learning, ML: Multimodal Learning, NLP: Question Answering

Abstract

Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates CoT reasoning, Knowledge Graphs (KGs), and multiple modalities for a comprehensive understanding of multimodal tasks. KAM-CoT adopts a two-stage training process with KG grounding to generate effective rationales and answers. By incorporating external knowledge from KGs during reasoning, the model gains a deeper contextual understanding reducing hallucinations and enhancing the quality of answers. This knowledge-augmented CoT reasoning empowers the model to handle questions requiring external context, providing more informed answers. Experimental findings show KAM-CoT outperforms the state-of-the-art methods. On the ScienceQA dataset, we achieve an average accuracy of 93.87%, surpassing GPT-3.5 (75.17%) by 18% and GPT-4 (83.99%) by 10%. Remarkably, KAM-CoT achieves these results with only 280M trainable parameters at a time, demonstrating its cost-efficiency and effectiveness.

Published

2024-03-24

How to Cite

Mondal, D., Modi, S., Panda, S., Singh, R., & Rao, G. S. (2024). KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18798-18806. https://doi.org/10.1609/aaai.v38i17.29844

Issue

Section

AAAI Technical Track on Natural Language Processing II