Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning
Keywords:Machine Learning (ML)
AbstractIt is of significance for an agent to autonomously explore the environment and learn a widely applicable and general-purpose goal-conditioned policy that can achieve diverse goals including images and text descriptions. Considering such perceptually-specific goals, one natural approach is to reward the agent with a prior non-parametric distance over the embedding spaces of states and goals. However, this may be infeasible in some situations, either because it is unclear how to choose suitable measurement, or because embedding (heterogeneous) goals and states is non-trivial. The key insight of this work is that we introduce a latent-conditioned policy to provide goals and intrinsic rewards for learning the goal-conditioned policy. As opposed to directly scoring current states with regards to goals, we obtain rewards by scoring current states with associated latent variables. We theoretically characterize the connection between our unsupervised objective and the multi-goal setting, and empirically demonstrate the effectiveness of our proposed method which substantially outperforms prior techniques in a variety of tasks.
How to Cite
Liu, J., Wang, D., Tian, Q., & Chen, Z. (2022). Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7), 7558-7566. https://doi.org/10.1609/aaai.v36i7.20721
AAAI Technical Track on Machine Learning II