On Convergence of Gradient Expected Sarsa(λ)
DOI:
https://doi.org/10.1609/aaai.v35i12.17270Keywords:
Reinforcement LearningAbstract
We study the convergence of Expected Sarsa(λ) with function approximation. We show that with off-line es- timate (multi-step bootstrapping) to ExpectedSarsa(λ) is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a con- vergent Gradient Expected Sarsa(λ) (GES(λ)) algorithm. The theoretical analysis shows that the proposed GES(λ) converges to the optimal solution at a linear convergence rate under true gradient setting. Furthermore, we develop a Lyapunov function technique to investigate how the step- size influences finite-time performance of GES(λ). Addition- ally, such a technique of Lyapunov function can be poten- tially generalized to other gradient temporal difference algo- rithms. Finally, our experiments verify the effectiveness of our GES(λ). For the details of proof, please refer to https: //arxiv.org/pdf/2012.07199.pdf.Downloads
Published
2021-05-18
How to Cite
Yang, L., Zheng, G., Zhang, Y., Zheng, Q., Li, P., & Pan, G. (2021). On Convergence of Gradient Expected Sarsa(λ). Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10621-10629. https://doi.org/10.1609/aaai.v35i12.17270
Issue
Section
AAAI Technical Track on Machine Learning V