Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to Q-Learning

Authors

  • Ankur Naskar Indian Institute of Science, Bengaluru
  • Gugan Thoppe Indian Institute of Science, Bengaluru
  • Vijay Gupta Purdue University

DOI:

https://doi.org/10.1609/aaai.v40i29.39632

Abstract

Algorithms for solving nonlinear fixed-point equations---such as average-reward Q-learning and TD-learning---often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these methods via Polyak–Ruppert averaging has remained elusive, largely due to the non-monotonicity of such semi-norms. We close this gap by (i.) recasting the averaged error as a linear recursion involving a nonlinear perturbation, and (ii.) taming the nonlinearity by coupling the semi-norm's contraction with the monotonicity of a suitably induced norm. Our main result yields the first parameter-free ~O(1/√t) optimal rates for Q-learning in both average-reward and exponentially discounted settings, where t denotes the iteration index. The result applies within a broad framework that accommodates both synchronous and asynchronous updates, single-agent and distributed deployments, and data streams obtained from either simulators or along Markovian trajectories.

Downloads

Published

2026-03-14

How to Cite

Naskar, A., Thoppe, G., & Gupta, V. (2026). Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(29), 24495-24503. https://doi.org/10.1609/aaai.v40i29.39632

Issue

Section

AAAI Technical Track on Machine Learning VI