Blind Decision Making: Reinforcement Learning with Delayed Observations
Keywords:Uncertainty And Stochasticity In Planning And Scheduling, Partially Observable And Unobservable Domains, Classical Planning Techniques And Analysis
AbstractIn Reinforcement Learning (RL) the current state of the environment may not always be available. One approach to fix this could be to include the actions after the last-known state as a part of the state information, however, that leads to an increased state-space making the problem complex and slower in convergence. We propose an approach, where the delay in the knowledge of the state can be used, and the decisions are made to maximize the expected state-action value function. The proposed algorithm is an alternate approach where the state space is not enlarged, as compared to the case when there is no delay in the state update. Evaluations on the basic RL environments further illustrate the improved performance of the proposed algorithm.
How to Cite
Agarwal, M., & Aggarwal, V. (2021). Blind Decision Making: Reinforcement Learning with Delayed Observations. Proceedings of the International Conference on Automated Planning and Scheduling, 31(1), 2-6. Retrieved from https://ojs.aaai.org/index.php/ICAPS/article/view/15940