Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

Đorđe Žikelić; Mathias Lechner; Thomas A. Henzinger; Krishnendu Chatterjee

doi:10.1609/aaai.v37i10.26407

Authors

Đorđe Žikelić IST Austria
Mathias Lechner MIT CSAIL
Thomas A. Henzinger IST Austria
Krishnendu Chatterjee IST Austria

DOI:

https://doi.org/10.1609/aaai.v37i10.26407

Keywords:

PEAI: Safety, Robustness & Trustworthiness, ML: Adversarial Learning & Robustness, ML: Calibration & Uncertainty Quantification, ML: Probabilistic Methods, RU: Other Foundations of Reasoning Under Uncertainty

Abstract

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p in [0,1] over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.

Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription