Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning

Yibo Zhang; Dengpeng Xing

doi:10.1609/aaai.v40i34.40076

Authors

Yibo Zhang Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
Dengpeng Xing Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i34.40076

Abstract

Goal-conditioned hierarchical reinforcement learning has demonstrated effectiveness in addressing complicated decision-making tasks by providing ''temporal extraction'', which decomposes tasks into smaller and more manageable ''subgoals''. This enables agents to plan over a longer time scale. However, achieving optimal exploration and exploitation still remains a challenge, especially for long-horizon or sparse-reward scenarios. In this paper, we introduce Active exploraion and hierarchical Self-Imitation (ASI), an effective scheme to enhance exploration and exploitation based on subgoal representation learning. The key point of ASI is to utilize temporal adjacency information in the representation space. We construct and dynamically update an adjacency graph that captures the relationships between subgoals. Based on the adjacency information provided by the graph, we design two mechanisms: active ``frontier-reaching'' exploration that faster expands the explored area by targeting boundary regions, and hierarchical self-imitation learning that leverages historical experience to facilitate both frontier reaching and policy training. Experimental results show that our method accelerates exploration and outperforms existing baselines in challenging long-horizon continuous control tasks.

Enhancing Exploration and Exploitation in Hierarchical Reinforcement Learning with Subgoal Graph Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information