Policy-Independent Behavioral Metric-Based Representation for Deep Reinforcement Learning

Authors

  • Weijian Liao Nanjing University
  • Zongzhang Zhang Nanjing University
  • Yang Yu Nanjing University Peng Cheng Laboratory, Shenzhen

DOI:

https://doi.org/10.1609/aaai.v37i7.26052

Keywords:

ML: Reinforcement Learning Algorithms, ML: Representation Learning, ML: Scalability of ML Systems

Abstract

Behavioral metrics can calculate the distance between states or state-action pairs from the rewards and transitions difference. By virtue of their capability to filter out task-irrelevant information in theory, using them to shape a state embedding space becomes a new trend of representation learning for deep reinforcement learning (RL), especially when there are explicit distracting factors in observation backgrounds. However, due to the tight coupling between the metric and the RL policy, such metric-based methods may result in less informative embedding spaces which can weaken their aid to the baseline RL algorithm and even consume more samples to learn. We resolve this by proposing a new behavioral metric. It decouples the learning of RL policy and metric owing to its independence on RL policy. We theoretically justify its scalability to continuous state and action spaces and design a practical way to incorporate it into an RL procedure as a representation learning target. We evaluate our approach on DeepMind control tasks with default and distracting backgrounds. By statistically reliable evaluation protocols, our experiments demonstrate our approach is superior to previous metric-based methods in terms of sample efficiency and asymptotic performance in both backgrounds.

Downloads

Published

2023-06-26

How to Cite

Liao, W., Zhang, Z., & Yu, Y. (2023). Policy-Independent Behavioral Metric-Based Representation for Deep Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 8746-8754. https://doi.org/10.1609/aaai.v37i7.26052

Issue

Section

AAAI Technical Track on Machine Learning II