Blind Decision Making: Reinforcement Learning with Delayed Observations

Authors

  • Mridul Agarwal Purdue University
  • Vaneet Aggarwal Purdue University

Keywords:

Uncertainty And Stochasticity In Planning And Scheduling, Partially Observable And Unobservable Domains, Classical Planning Techniques And Analysis

Abstract

In Reinforcement Learning (RL) the current state of the environment may not always be available. One approach to fix this could be to include the actions after the last-known state as a part of the state information, however, that leads to an increased state-space making the problem complex and slower in convergence. We propose an approach, where the delay in the knowledge of the state can be used, and the decisions are made to maximize the expected state-action value function. The proposed algorithm is an alternate approach where the state space is not enlarged, as compared to the case when there is no delay in the state update. Evaluations on the basic RL environments further illustrate the improved performance of the proposed algorithm.

Downloads

Published

2021-05-17

How to Cite

Agarwal, M., & Aggarwal, V. (2021). Blind Decision Making: Reinforcement Learning with Delayed Observations. Proceedings of the International Conference on Automated Planning and Scheduling, 31(1), 2-6. Retrieved from https://ojs.aaai.org/index.php/ICAPS/article/view/15940