T4NMTD: Transition-Centric Reinforcement Learning for Non-Markovian Task Decomposition
DOI:
https://doi.org/10.1609/aaai.v40i29.39620Abstract
Non-Markovian Tasks (NMTs) are distinguished by their dependence on long-term memory and state-dependent dynamics, setting them apart from the traditional Markovian models typically employed in Reinforcement Learning (RL). NMTs not only suffer from reward sparseness but also rely on historical information, making their resolution considerably more challenging. In this paper, we propose a novel RL framework T4NMTD (Transition-centric framework for NMT Decomposition), designed specifically for learning NMTs which are specified by temporal logic. The core of T4NMTD is a task decomposition mechanism along with a parallel training approach for NMTs. An NMT is first decomposed as basic units based on the transitions of the automata which are derived from temporal logic formulae. The units are then modularized into sub-tasks according to their semantic similarity under logical interpretation. The training strategy of T4NMTD adopts a dual-level structure: the high-level learns to shape the boundaries and coordinate arrangement of the sub-tasks from a global perspective, while the low-level learns those sub-tasks in parallel. In addition, we invent a dynamic policy intervention scheme to mitigate the policy myopic issue during parallel training. A comprehensive evaluation is conducted on benchmark problems with respect to various metrics. The experimental results demonstrate that T4NMTD effectively addresses NMTs, achieving significant performance improvements compared with related studies.Published
2026-03-14
How to Cite
Miao, R., Lu, X., Tian, C., Yu, B., & Duan, Z. (2026). T4NMTD: Transition-Centric Reinforcement Learning for Non-Markovian Task Decomposition. Proceedings of the AAAI Conference on Artificial Intelligence, 40(29), 24388-24395. https://doi.org/10.1609/aaai.v40i29.39620
Issue
Section
AAAI Technical Track on Machine Learning VI