State-Wise Adaptive Discounting from Experience (SADE): A Novel Discounting Scheme for Reinforcement Learning (Student Abstract)

Authors

  • Milan Zinzuvadiya Secure and Assured Intelligent Learning Lab, University of New Haven
  • Vahid Behzadan Secure and Assured Intelligent Learning Lab, University of New Haven

DOI:

https://doi.org/10.1609/aaai.v35i18.17973

Keywords:

Reinforcement Learning, Delay Discounting, Adaptive Discounting

Abstract

In Markov Decision Process (MDP) models of sequential decision-making, it is common practice to account for temporal discounting by incorporating a constant discount factor. While the effectiveness of fixed-rate discounting in various Reinforcement Learning (RL) settings is well-established, the efficiency of this scheme has been questioned in recent studies. Another notable shortcoming of fixed-rate discounting stems from abstracting away the experiential information of the agent, which is shown to be a significant component of delay discounting in human cognition. To address this issue, we propose State-wise Adaptive Discounting from Experience (SADE) as a novel adaptive discounting scheme for RL agents. SADE leverages the experiential observations of state values in episodic trajectories to iteratively adjust state-specific discount rates. We report experimental evaluations of SADE in Q-learning agents, which demonstrate significant enhancement of sample complexity and convergence rate compared to fixed-rate discounting.

Downloads

Published

2021-05-18

How to Cite

Zinzuvadiya, M., & Behzadan, V. (2021). State-Wise Adaptive Discounting from Experience (SADE): A Novel Discounting Scheme for Reinforcement Learning (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 35(18), 15953-15954. https://doi.org/10.1609/aaai.v35i18.17973

Issue

Section

AAAI Student Abstract and Poster Program