High-Order Error Bounds for Markovian LSA with Richardson–Romberg Extrapolation

Authors

  • Ilya Levin HSE University
  • Alexey Naumov HSE University
  • Sergey Samsonov HSE University

DOI:

https://doi.org/10.1609/aaai.v40i43.40994

Abstract

In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size and propose a novel decomposition of the bias via a linearization technique. We analyze the structure of the bias and show that the leading-order term is linear in the step size and cannot be eliminated by PR averaging. To address this, we apply the Richardson-Romberg (RR) extrapolation procedure, which effectively cancels the leading bias term. We derive high-order moment bounds for the RR iterates and show that the leading error term aligns with the asymptotically optimal covariance matrix of the vanilla averaged LSA iterates. We validate applicability of our findings for the temporal difference algorithm in reinforcement learning.

Downloads

Published

2026-03-14

How to Cite

Levin, I., Naumov, A., & Samsonov, S. (2026). High-Order Error Bounds for Markovian LSA with Richardson–Romberg Extrapolation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36696–36704. https://doi.org/10.1609/aaai.v40i43.40994

Issue

Section

AAAI Technical Track on Reasoning under Uncertainty