SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning

Authors

  • Jiaheng Feng EEIS Department, University of Science and Technology of China
  • Mingxiao Feng EEIS Department, University of Science and Technology of China
  • Haolin Song EEIS Department, University of Science and Technology of China
  • Wengang Zhou EEIS Department, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
  • Houqiang Li EEIS Department, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

DOI:

https://doi.org/10.1609/aaai.v38i11.29083

Keywords:

ML: Reinforcement Learning

Abstract

Offline-to-online reinforcement learning (RL) provides a promising solution to improving suboptimal offline pre-trained policies through online fine-tuning. However, one efficient method, unconstrained fine-tuning, often suffers from severe policy collapse due to excessive distribution shift. To ensure stability, existing methods retain offline constraints and employ additional techniques during fine-tuning, which hurts efficiency. In this work, we introduce a novel perspective: eliminating the policy collapse without imposing constraints. We observe that such policy collapse arises from the mismatch between unconstrained fine-tuning and the conventional RL training framework. To this end, we propose Stabilized Unconstrained Fine-tuning (SUF), a streamlined framework that benefits from the efficiency of unconstrained fine-tuning while ensuring stability by modifying the Update-To-Data ratio. With just a few lines of code adjustments, SUF demonstrates remarkable adaptability to diverse backbones and superior performance over state-of-the-art baselines.

Published

2024-03-24

How to Cite

Feng, J., Feng, M., Song, H., Zhou, W., & Li, H. (2024). SUF: Stabilized Unconstrained Fine-Tuning for Offline-to-Online Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 11961-11969. https://doi.org/10.1609/aaai.v38i11.29083

Issue

Section

AAAI Technical Track on Machine Learning II