Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning
DOI:
https://doi.org/10.1609/aaai.v40i34.40076Abstract
Goal-conditioned hierarchical reinforcement learning has demonstrated effectiveness in addressing complicated decision-making tasks by providing ''temporal extraction'', which decomposes tasks into smaller and more manageable ''subgoals''. This enables agents to plan over a longer time scale. However, achieving optimal exploration and exploitation still remains a challenge, especially for long-horizon or sparse-reward scenarios. In this paper, we introduce Active exploraion and hierarchical Self-Imitation (ASI), an effective scheme to enhance exploration and exploitation based on subgoal representation learning. The key point of ASI is to utilize temporal adjacency information in the representation space. We construct and dynamically update an adjacency graph that captures the relationships between subgoals. Based on the adjacency information provided by the graph, we design two mechanisms: active ``frontier-reaching'' exploration that faster expands the explored area by targeting boundary regions, and hierarchical self-imitation learning that leverages historical experience to facilitate both frontier reaching and policy training. Experimental results show that our method accelerates exploration and outperforms existing baselines in challenging long-horizon continuous control tasks.Published
2026-03-14
How to Cite
Zhang, Y., & Xing, D. (2026). Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 28465–28473. https://doi.org/10.1609/aaai.v40i34.40076
Issue
Section
AAAI Technical Track on Machine Learning XI