TORA: Train Once, Realign Anytime for Offline Multi-Objective Reinforcement Learning

Weichen Li; Waleed Mustafa; Marcio Monteiro; Puyu Wang; Marius Kloft; Sophie Fellenz

doi:10.1609/aaai.v40i44.41095

Authors

Weichen Li RPTU University Kaiserslautern-Landau
Waleed Mustafa RPTU University Kaiserslautern-Landau
Marcio Monteiro RPTU University Kaiserslautern-Landau
Puyu Wang RPTU University Kaiserslautern-Landau
Marius Kloft RPTU University Kaiserslautern-Landau
Sophie Fellenz RPTU University Kaiserslautern-Landau

DOI:

https://doi.org/10.1609/aaai.v40i44.41095

Abstract

Intelligent agents in real-world applications must adapt their behavior to changing contexts and user preferences. For example, planning a road trip requires considering both travel time and cost. Multi-objective reinforcement learning (MORL) provides a principled approach to navigate such trade-offs. However, most existing approaches require predefined preference weights during training and jointly optimize the model for all objectives. In this paper, we introduce TORA (Train Once, Realign Anytime), a novel framework that defers preference integration to inference time, enabling flexible adaptation to user preferences without retraining. TORA independently trains diffusion planning models for each objective and combines them at inference time using user-specified preferences to generate behavior aligned with desired trade-offs. Furthermore, new objectives can be added seamlessly by training additional models without modifying existing ones. Empirical evaluations on standard offline MORL benchmarks demonstrate that TORA achieves competitive and consistent performance compared to methods that require fixed preference weights.

TORA: Train Once, Realign Anytime for Offline Multi-Objective Reinforcement Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information