Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

Jinxin Liu; Donglin Wang; Qiangxing Tian; Zhengyu Chen

doi:10.1609/aaai.v36i7.20721

Authors

Jinxin Liu Zhejiang University Westlake University Westlake Institute for Advanced Study
Donglin Wang Westlake University Westlake Institute for Advanced Study
Qiangxing Tian Zhejiang University Westlake University Westlake Institute for Advanced Study
Zhengyu Chen Zhejiang University Westlake University Westlake Institute for Advanced Study

DOI:

https://doi.org/10.1609/aaai.v36i7.20721

Keywords:

Machine Learning (ML)

Abstract

It is of significance for an agent to autonomously explore the environment and learn a widely applicable and general-purpose goal-conditioned policy that can achieve diverse goals including images and text descriptions. Considering such perceptually-specific goals, one natural approach is to reward the agent with a prior non-parametric distance over the embedding spaces of states and goals. However, this may be infeasible in some situations, either because it is unclear how to choose suitable measurement, or because embedding (heterogeneous) goals and states is non-trivial. The key insight of this work is that we introduce a latent-conditioned policy to provide goals and intrinsic rewards for learning the goal-conditioned policy. As opposed to directly scoring current states with regards to goals, we obtain rewards by scoring current states with associated latent variables. We theoretically characterize the connection between our unsupervised objective and the multi-goal setting, and empirically demonstrate the effectiveness of our proposed method which substantially outperforms prior techniques in a variety of tasks.

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription