Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)

Authors

  • Raja Farrukh Ali Kansas State University
  • Kevin Duong Kansas State University
  • Nasik Muhammad Nafi Kansas State University
  • William Hsu Kansas State University

DOI:

https://doi.org/10.1609/aaai.v37i13.26935

Keywords:

Reinforcement Learning, Multi-horizon, Reward Discounting

Abstract

Value estimates at multiple timescales can help create advanced discounting functions and allow agents to form more effective predictive models of their environment. In this work, we investigate learning over multiple horizons concurrently for off-policy reinforcement learning by using an advantage-based action selection method and introducing architectural improvements. Our proposed agent learns over multiple horizons simultaneously, while using either exponential or hyperbolic discounting functions. We implement our approach on Rainbow, a value-based off-policy algorithm, and test on Procgen, a collection of procedurally-generated environments, to demonstrate the effectiveness of this approach, specifically to evaluate the agent's performance in previously unseen scenarios.

Downloads

Published

2023-09-06

How to Cite

Ali, R. F., Duong, K., Nafi, N. M., & Hsu, W. (2023). Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 37(13), 16150-16151. https://doi.org/10.1609/aaai.v37i13.26935