TORA: Train Once, Realign Anytime for Offline Multi-Objective Reinforcement Learning

Authors

  • Weichen Li RPTU University Kaiserslautern-Landau
  • Waleed Mustafa RPTU University Kaiserslautern-Landau
  • Marcio Monteiro RPTU University Kaiserslautern-Landau
  • Puyu Wang RPTU University Kaiserslautern-Landau
  • Marius Kloft RPTU University Kaiserslautern-Landau
  • Sophie Fellenz RPTU University Kaiserslautern-Landau

DOI:

https://doi.org/10.1609/aaai.v40i44.41095

Abstract

Intelligent agents in real-world applications must adapt their behavior to changing contexts and user preferences. For example, planning a road trip requires considering both travel time and cost. Multi-objective reinforcement learning (MORL) provides a principled approach to navigate such trade-offs. However, most existing approaches require predefined preference weights during training and jointly optimize the model for all objectives. In this paper, we introduce TORA (Train Once, Realign Anytime), a novel framework that defers preference integration to inference time, enabling flexible adaptation to user preferences without retraining. TORA independently trains diffusion planning models for each objective and combines them at inference time using user-specified preferences to generate behavior aligned with desired trade-offs. Furthermore, new objectives can be added seamlessly by training additional models without modifying existing ones. Empirical evaluations on standard offline MORL benchmarks demonstrate that TORA achieves competitive and consistent performance compared to methods that require fixed preference weights.

Downloads

Published

2026-03-14

How to Cite

Li, W., Mustafa, W., Monteiro, M., Wang, P., Kloft, M., & Fellenz, S. (2026). TORA: Train Once, Realign Anytime for Offline Multi-Objective Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37609–37617. https://doi.org/10.1609/aaai.v40i44.41095

Issue

Section

AAAI Special Track on AI Alignment