On Shallow Planning Under Partial Observability

Randy Lefebvre; Audrey Durand

doi:10.1609/aaai.v39i25.34860

On Shallow Planning Under Partial Observability

Authors

Randy Lefebvre Université Laval
Audrey Durand Université Laval Canada CIFAR AI Chair

DOI:

https://doi.org/10.1609/aaai.v39i25.34860

Abstract

Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (dis- counted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Lefebvre, R., & Durand, A. (2025). On Shallow Planning Under Partial Observability. Proceedings of the AAAI Conference on Artificial Intelligence, 39(25), 26587–26595. https://doi.org/10.1609/aaai.v39i25.34860

Download Citation

Issue

Vol. 39 No. 25: AAAI-25 Technical Tracks 25

Section

AAAI Technical Track on Planning, Routing, and Scheduling

On Shallow Planning Under Partial Observability

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information