Policy-Independent Behavioral Metric-Based Representation for Deep Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v37i7.26052Keywords:
ML: Reinforcement Learning Algorithms, ML: Representation Learning, ML: Scalability of ML SystemsAbstract
Behavioral metrics can calculate the distance between states or state-action pairs from the rewards and transitions difference. By virtue of their capability to filter out task-irrelevant information in theory, using them to shape a state embedding space becomes a new trend of representation learning for deep reinforcement learning (RL), especially when there are explicit distracting factors in observation backgrounds. However, due to the tight coupling between the metric and the RL policy, such metric-based methods may result in less informative embedding spaces which can weaken their aid to the baseline RL algorithm and even consume more samples to learn. We resolve this by proposing a new behavioral metric. It decouples the learning of RL policy and metric owing to its independence on RL policy. We theoretically justify its scalability to continuous state and action spaces and design a practical way to incorporate it into an RL procedure as a representation learning target. We evaluate our approach on DeepMind control tasks with default and distracting backgrounds. By statistically reliable evaluation protocols, our experiments demonstrate our approach is superior to previous metric-based methods in terms of sample efficiency and asymptotic performance in both backgrounds.Downloads
Published
2023-06-26
How to Cite
Liao, W., Zhang, Z., & Yu, Y. (2023). Policy-Independent Behavioral Metric-Based Representation for Deep Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 8746-8754. https://doi.org/10.1609/aaai.v37i7.26052
Issue
Section
AAAI Technical Track on Machine Learning II