CGMIS: Concept-Graph Based Multi-Hop Instructions Synthesis for Enhancing Long-Context Reasoning

Authors

  • Zechen Sun Soochow University
  • Zecheng Tang Soochow University
  • Juntao Li Soochow University
  • Wenpeng Hu Academy of Military Science
  • Wenliang Chen Soochow University
  • Zhunchen Luo Academy of Military Science
  • Qiaoming Zhu Soochow University

DOI:

https://doi.org/10.1609/aaai.v40i39.40599

Abstract

High-quality multi-hop instruction data is critical for enhancing the reasoning capabilities of large language models (LLMs) in complex long-context scenarios, e.g., long-form reasoning. Nevertheless, there is currently a notable scarcity of such datasets within the community, and existing data synthesis approaches typically fail to provide explicit modeling of intermediate reasoning steps, resulting in unverifiable and potentially erroneous samples. To mitigate above issue, we design the Concept-Graph based Multi-hop Instructions Synthesis (CGMIS) framework, which constructs long-form reasoning paths via concept graph traversal and automatically generates verifiable multi-hop data. The CGMIS framework not only guarantees the accuracy and verifiability of the synthesized data but also enables the construction of high-quality multi-hop instruction datasets from arbitrary corpora. Experiments show that fine-tuning with CGMIS-generated data achieves state-of-the-art performance across 13 long-context reasoning tasks on various models, using only 10% of the data volume required by existing methods.

Downloads

Published

2026-03-14

How to Cite

Sun, Z., Tang, Z., Li, J., Hu, W., Chen, W., Luo, Z., & Zhu, Q. (2026). CGMIS: Concept-Graph Based Multi-Hop Instructions Synthesis for Enhancing Long-Context Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33153–33161. https://doi.org/10.1609/aaai.v40i39.40599

Issue

Section

AAAI Technical Track on Natural Language Processing IV