State Proficiency-Based Adaptive Fine-Tuning for Offline-to-Online Reinforcement Learning

Songlin Li; Wei Xiao; Hao Wu; Xiaodan Zhang; Daolong An; Shuai Lü

doi:10.1609/aaai.v40i28.39484

Authors

Songlin Li Jilin University
Wei Xiao Jilin University
Hao Wu Jilin University
Xiaodan Zhang Jilin University
Daolong An Jilin University
Shuai Lü Jilin University

DOI:

https://doi.org/10.1609/aaai.v40i28.39484

Abstract

In offline-to-online (O2O) reinforcement learning, achieving efficient performance improvement while maintaining training stability remains a critical challenge for effective fine-tuning. Existing O2O methods usually focus on the balance between policy improvement and policy constraint during online fine-tuning. However, they often overlook sample differences, leading to suboptimal performance. To address this challenge, we identify that the effectiveness of policy learning exhibits significant variation across states. Therefore, we propose the notion of state proficiency to capture the degree of effective learning in a given state. We propose State Proficiency-Based Adaptive Fine-Tuning (SPA), a straightforward yet effective method that establishes proficiency-based sample priorities in policy optimization to facilitate effective fine-tuning. Specifically, SPA focuses on low proficiency samples during policy improvement to enhance sample efficiency, while emphasizing high proficiency samples during policy constraint to ensure stable training. Extensive empirical results demonstrate that SPA achieves significant improvements over existing methods, attaining state-of-the-art performance on the D4RL benchmark.

State Proficiency-Based Adaptive Fine-Tuning for Offline-to-Online Reinforcement Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information