Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models

Authors

  • Jie Ma Xi'an Jiaotong University
  • Zhitao Gao Xi'an Jiaotong University
  • Qi Chai The Hong Kong University of Science and Technology (Guangzhou)
  • Wangchun Sun Xi'an Jiaotong University
  • Pinghui Wang Xi'an Jiaotong University
  • Hongbin Pei Xi'an Jiaotong University
  • Jing Tao Xi'an Jiaotong University
  • Lingyun Song Northwest Polytechnical University
  • Jun Liu Xi'an Jiaotong University
  • Chen Zhang Zhejiang Createlink Technology
  • Lizhen Cui Shandong University

DOI:

https://doi.org/10.1609/aaai.v39i23.34658

Abstract

Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical touchstone for the integration. This task requires LLMs to answer natural language questions by retrieving relevant triples from knowledge graphs. However, existing methods face two significant challenges: *excessively long reasoning paths distracting from the answer generation*, and *false-positive relations hindering the path refinement*. In this paper, we propose an iterative interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG). Specifically, DoG employs a subgraph-focusing mechanism, allowing LLMs to perform answer trying after each reasoning step, thereby mitigating the impact of lengthy reasoning paths. On the other hand, DoG utilizes a multi-role debate team to gradually simplify complex questions, reducing the influence of false-positive relations. This debate mechanism ensures the reliability of the reasoning process. Experimental results on five public datasets demonstrate the effectiveness and superiority of our architecture. Notably, DoG outperforms the state-of-the-art method ToG by 23.7% and 9.1% in accuracy on WebQuestions and GrailQA, respectively. Furthermore, the integration experiments with various LLMs on the mentioned datasets highlight the flexibility of DoG.

Published

2025-04-11

How to Cite

Ma, J., Gao, Z., Chai, Q., Sun, W., Wang, P., Pei, H., … Cui, L. (2025). Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(23), 24768–24776. https://doi.org/10.1609/aaai.v39i23.34658

Issue

Section

AAAI Technical Track on Natural Language Processing II