State Proficiency-Based Adaptive Fine-Tuning for Offline-to-Online Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v40i28.39484Abstract
In offline-to-online (O2O) reinforcement learning, achieving efficient performance improvement while maintaining training stability remains a critical challenge for effective fine-tuning. Existing O2O methods usually focus on the balance between policy improvement and policy constraint during online fine-tuning. However, they often overlook sample differences, leading to suboptimal performance. To address this challenge, we identify that the effectiveness of policy learning exhibits significant variation across states. Therefore, we propose the notion of state proficiency to capture the degree of effective learning in a given state. We propose State Proficiency-Based Adaptive Fine-Tuning (SPA), a straightforward yet effective method that establishes proficiency-based sample priorities in policy optimization to facilitate effective fine-tuning. Specifically, SPA focuses on low proficiency samples during policy improvement to enhance sample efficiency, while emphasizing high proficiency samples during policy constraint to ensure stable training. Extensive empirical results demonstrate that SPA achieves significant improvements over existing methods, attaining state-of-the-art performance on the D4RL benchmark.Downloads
Published
2026-03-14
How to Cite
Li, S., Xiao, W., Wu, H., Zhang, X., An, D., & Lü, S. (2026). State Proficiency-Based Adaptive Fine-Tuning for Offline-to-Online Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23169–23176. https://doi.org/10.1609/aaai.v40i28.39484
Issue
Section
AAAI Technical Track on Machine Learning V