RCAFlow: A Workflow-Informed Hierarchical Planning Multi-Agent System for Root Cause Analysis
DOI:
https://doi.org/10.1609/aaai.v40i1.36991Abstract
As microservice architectures become increasingly complex and system events become more frequent, Root Cause Analysis (RCA) has emerged as a critical task to ensure system reliability. However, existing deep learning-based methods often struggle with limited flexibility and a lack of interpretability when addressing complex system failures. Recent efforts to integrate large language models (LLMs) have shown promise in enhancing diagnostic transparency and reasoning capability. However, expansive search spaces, intricate workflows, and entangled constraints constrain practical adoption. We propose RCAFlow, a multi-agent framework that integrates structured workflow knowledge with hierarchical planning to address these challenges. RCAFlow transforms semi-structured documents into behavior tree-style workflows to support interpretable plan generation, employs a Git-inspired branching mechanism for modular and hierarchical task execution with path isolation, and leverages state-aware task execution with semantic analysis to improve result understanding and feedback. We evaluate RCAFlow on three benchmark datasets provided by OpenRCA. Experimental results demonstrate that RCAFlow consistently outperforms existing methods across all datasets. Further ablation studies confirm the effectiveness of each core module, highlighting the reliability, extensibility, and interpretability of RCAFlow to support complex RCA tasks within intelligent IT operations.Downloads
Published
2026-03-14
How to Cite
Gao, Y., Cai, Z., & Yang, B. (2026). RCAFlow: A Workflow-Informed Hierarchical Planning Multi-Agent System for Root Cause Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 40(1), 300-308. https://doi.org/10.1609/aaai.v40i1.36991
Issue
Section
AAAI Technical Track on Application Domains I