Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning

Authors

  • Yibo Zhang Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Dengpeng Xing Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i34.40076

Abstract

Goal-conditioned hierarchical reinforcement learning has demonstrated effectiveness in addressing complicated decision-making tasks by providing ''temporal extraction'', which decomposes tasks into smaller and more manageable ''subgoals''. This enables agents to plan over a longer time scale. However, achieving optimal exploration and exploitation still remains a challenge, especially for long-horizon or sparse-reward scenarios. In this paper, we introduce Active exploraion and hierarchical Self-Imitation (ASI), an effective scheme to enhance exploration and exploitation based on subgoal representation learning. The key point of ASI is to utilize temporal adjacency information in the representation space. We construct and dynamically update an adjacency graph that captures the relationships between subgoals. Based on the adjacency information provided by the graph, we design two mechanisms: active ``frontier-reaching'' exploration that faster expands the explored area by targeting boundary regions, and hierarchical self-imitation learning that leverages historical experience to facilitate both frontier reaching and policy training. Experimental results show that our method accelerates exploration and outperforms existing baselines in challenging long-horizon continuous control tasks.

Downloads

Published

2026-03-14

How to Cite

Zhang, Y., & Xing, D. (2026). Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 28465–28473. https://doi.org/10.1609/aaai.v40i34.40076

Issue

Section

AAAI Technical Track on Machine Learning XI