Distributional Reinforcement Learning via Moment Matching

Thanh Nguyen-Tang; Sunil Gupta; Svetha Venkatesh

doi:10.1609/aaai.v35i10.17104

Authors

Thanh Nguyen-Tang Deakin University
Sunil Gupta Deakin University
Svetha Venkatesh Deakin University

DOI:

https://doi.org/10.1609/aaai.v35i10.17104

Keywords:

Reinforcement Learning, Kernel Methods, (Deep) Neural Network Algorithms, Representation Learning

Abstract

We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.

Distributional Reinforcement Learning via Moment Matching

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription