On Convergence of Gradient Expected Sarsa(λ)

Long Yang; Gang Zheng; Yu Zhang; Qian Zheng; Pengfei Li; Gang Pan

doi:10.1609/aaai.v35i12.17270

Authors

Long Yang Zhejiang University, China
Gang Zheng Zhejiang University, China
Yu Zhang Zhejiang University, China
Qian Zheng Nanyang Technological University,Singapore
Pengfei Li Zhejiang University, China
Gang Pan Zhejiang University, China

DOI:

https://doi.org/10.1609/aaai.v35i12.17270

Keywords:

Reinforcement Learning

Abstract

We study the convergence of Expected Sarsa(λ) with function approximation. We show that with off-line es- timate (multi-step bootstrapping) to ExpectedSarsa(λ) is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a con- vergent Gradient Expected Sarsa(λ) (GES(λ)) algorithm. The theoretical analysis shows that the proposed GES(λ) converges to the optimal solution at a linear convergence rate under true gradient setting. Furthermore, we develop a Lyapunov function technique to investigate how the step- size influences finite-time performance of GES(λ). Addition- ally, such a technique of Lyapunov function can be poten- tially generalized to other gradient temporal difference algo- rithms. Finally, our experiments verify the effectiveness of our GES(λ). For the details of proof, please refer to https: //arxiv.org/pdf/2012.07199.pdf.

On Convergence of Gradient Expected Sarsa(λ)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information