A Differential Perspective on Distributional Reinforcement Learning

Authors

  • Juan Sebastian Rojas University of Toronto
  • Chi-Guhn Lee University of Toronto

DOI:

https://doi.org/10.1609/aaai.v40i30.39706

Abstract

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms yield competitive and sometimes superior performance when compared to their non-distributional equivalents, while also capturing rich information about the long-run per-step reward and differential return distributions.

Downloads

Published

2026-03-14

How to Cite

Rojas, J. S., & Lee, C.-G. (2026). A Differential Perspective on Distributional Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(30), 25160–25167. https://doi.org/10.1609/aaai.v40i30.39706

Issue

Section

AAAI Technical Track on Machine Learning VII