OncoCoT: A Temporal-causal Chain-of-Thought Dataset for Oncologic Decision-Making
DOI:
https://doi.org/10.1609/aaai.v40i40.40724Abstract
Long Chain-of-Thought (CoT) reasoning has shown great promise in complex reasoning tasks, but its application to medical decision-making presents unique challenges. Unlike structured tasks relying on static verification frameworks, medical decision-making requires dynamic validation through longitudinal clinical outcomes, exhibiting temporal-causal dependencies that complicate the verification of reasoning processes. Therefore, we introduce a novel data construction framework specifically designed for medical decision-making. First, the framework analyzes real-world clinical cases to construct a timeline of medical events and identify critical decision points, including examination, diagnosis, and treatment. Subsequently, it employs a clinical causality-aware strategy to generate decision-making questions at the identified points, along with reasoning traces and corresponding answers. Finally, information drawn from future nodes serves as clinical logic-constrained criteria to re-evaluate and refine the soundness of the generated reasoning and responses. Building on this, we present OncoCoT, an oncologic decision-making dataset derived from clinical records over the past four years across eight common cancer types. Furthermore, we distill a subset of OncoCoT into a dedicated benchmark, OncoEval, to facilitate systematic evaluation of clinical reasoning capabilities in LLMs. Evaluation results show that existing state-of-the-art reasoning models, such as Deepseek-r1 and GPT-o3, exhibit limited capability in addressing clinical problems in OncoEval, highlighting the need for further improvement.Downloads
Published
2026-03-14
How to Cite
Yang, P., Li, Y., Wang, S., Liu, X., Gan, H., Li, X., … Huang, Y. (2026). OncoCoT: A Temporal-causal Chain-of-Thought Dataset for Oncologic Decision-Making. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 34277–34285. https://doi.org/10.1609/aaai.v40i40.40724
Issue
Section
AAAI Technical Track on Natural Language Processing V