MO-VLA: Preference Adaptation for Vision-Language-Action Models via Multi-Objective Reinforcement Learning

Authors

  • Yan Yang Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Yuquan Wu Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Mingxuan Jing Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/icaps.v36i1.42864

Abstract

Vision-Language-Action (VLA) models trained with large-scale behavior cloning (BC) have achieved substantial progress in producing diverse and complex robotic behaviors, and have been widely deployed on robot manipulation, human-robot cooperation, and autonomous driving tasks. However, existing VLA models focus on generating complex behaviors from generalized human instructions, at the expense of capturing the critical preference and intent information (regarding speed, smoothness, safety, etc. ) conveyed by users. To address this limitation, we propose MO-VLA, a two-stage framework that integrates Multi-Objective Reinforcement Learning (MORL) into VLA training. Our framework is designed to maintain the performance of the pre-trained model while accelerating convergence when modeling user preferences. It operates in two stages. First, behavioral cloning (BC) on large-scale demonstrations is used for equipping the model with general operational skills. Subsequently, a multi-objective reinforcement learning (MORL) based fine-tuning stage is used for adapting this policy to user-specific preferences. Here, a Feature-wise Linear Modulation (FiLM) mechanism is integrated into the action head to explicitly inject preference signals into the policy generation process. Experimental results on the Meta-World benchmark demonstrate that our method achieves superior multi-objective performance while maintaining a high task success rate. These results validate its capability for preference-aware action generation for various robotic tasks.

Downloads

Published

2026-06-08

How to Cite

Yang, Y., Wu, Y., & Jing, M. (2026). MO-VLA: Preference Adaptation for Vision-Language-Action Models via Multi-Objective Reinforcement Learning. Proceedings of the International Conference on Automated Planning and Scheduling, 36(1), 470–479. https://doi.org/10.1609/icaps.v36i1.42864