Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models

Authors

  • Hongyin Zhang Zhejiang University Westlake University
  • Shiyuan Zhang University of California, Los Angeles
  • Junxi Jin Westlake University
  • Qixin Zeng Westlake University
  • Yifan Qiao National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University
  • Hongchao Lu Westlake University
  • Donglin Wang Westlake University

DOI:

https://doi.org/10.1609/aaai.v40i22.38944

Abstract

Vision-Language-Action (VLA) models based on flow matching have shown excellent performance in general-purpose robotic manipulation tasks. However, the action accuracy of these models on complex downstream tasks is unsatisfactory. One important reason is that these models rely solely on the post-training paradigm of imitation learning, which makes it difficult to have a deeper understanding of the distribution properties of data quality, which is exactly what Reinforcement Learning (RL) excels at. In this paper, we theoretically propose an offline RL post-training objective for VLA flow models and induce an efficient and feasible offline RL fine-tuning algorithm −− Adaptive Reinforced Flow Matching (ARFM). By introducing an adaptively adjusted scaling factor in the VLA flow model loss, we construct a principled bias-variance trade-off objective function to optimally control the impact of RL signal on flow loss. ARFM adaptively balances RL advantage preservation and flow loss gradient variance control, resulting in a more stable and efficient fine-tuning process. Extensive simulation and real-world experimental results show that ARFM exhibits excellent generalization, robustness, few-shot learning, and continuous learning performance.

Downloads

Published

2026-03-14

How to Cite

Zhang, H., Zhang, S., Jin, J., Zeng, Q., Qiao, Y., Lu, H., & Wang, D. (2026). Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18755-18763. https://doi.org/10.1609/aaai.v40i22.38944

Issue

Section

AAAI Technical Track on Intelligent Robotics