Variance Reduction via Resampling and Experience Replay

Authors

  • Jiale Han University of California, Los Angeles
  • Xiaowu Dai University of California, Los Angeles
  • Yuhua Zhu University of California, Los Angeles

DOI:

https://doi.org/10.1609/aaai.v40i26.39304

Abstract

Experience replay is a foundational technique in reinforcement learning that enhances learning stability by storing past experiences in a replay buffer and reusing them during training. Despite its practical success, its theoretical properties remain underexplored. In this paper, we present a theoretical framework that models experience replay using resampled U- and V-statistics, providing rigorous variance reduction guarantees. We apply this framework to policy evaluation tasks using the Least-Squares Temporal Difference (LSTD) algorithm and a Partial Differential Equation (PDE)-based model-free algorithm, demonstrating significant improvements in stability and efficiency, particularly in data-scarce scenarios. Beyond policy evaluation, we extend the framework to kernel ridge regression, showing that the experience replay-based method reduces the computational cost from the traditional cubic time to quadratic time in the sample size, while also reducing variance. Extensive numerical experiments validate our theoretical findings, demonstrating the broad applicability and effectiveness of experience replay in diverse machine learning tasks.

Published

2026-03-14

How to Cite

Han, J., Dai, X., & Zhu, Y. (2026). Variance Reduction via Resampling and Experience Replay. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21558–21566. https://doi.org/10.1609/aaai.v40i26.39304

Issue

Section

AAAI Technical Track on Machine Learning III