Beyond Hard Constraints: Budget-Conditioned Reachability for Safe Offline Reinforcement Learning
DOI:
https://doi.org/10.1609/icaps.v36i1.42876Abstract
Sequential decision-making using Markov Decision Process underpins many real-world applications. Both model-based and model-free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with safety constraints, often conflicting objectives, that can lead to unstable min–max, adversarial optimization. A promising alternative is safety reachability analysis, which precomputes a forward-invariant safe state–action set, ensuring that an agent starting inside this set remains safe indefinitely. Yet, most reachability-based methods address only hard safety constraints, and little work extends reachability to cumulative cost constraints. To address this, first, we define a safety-conditioned reachability set that decouples reward maximization from cumulative safety cost constraints. Second, we show how this set enforces safety constraints without unstable min–max or Lagrangian optimization, yielding a novel offline safe RL algorithm that learns a safe policy from a fixed dataset without environment interaction. Finally, experiments on standard offline safe-RL benchmarks, and a real-world maritime navigation task demonstrate that our method matches or outperforms state-of-the-art baselines while maintaining safety.Downloads
Published
2026-06-08
How to Cite
Brahmanage, J., & Kumar, A. (2026). Beyond Hard Constraints: Budget-Conditioned Reachability for Safe Offline Reinforcement Learning. Proceedings of the International Conference on Automated Planning and Scheduling, 36(1), 581–590. https://doi.org/10.1609/icaps.v36i1.42876