Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications

Authors

  • Honghao Wei Washington State University

DOI:

https://doi.org/10.1609/aaai.v40i47.41358

Abstract

Safe reinforcement learning (RL) has emerged as a key paradigm for deploying AI in high-stakes domains such as autonomous driving, robotics, healthcare, and recommender systems. By embedding constraints into the learning process, safe RL enables agents to optimize performance while satisfying critical requirements, including collision avoidance, resource limits, and system reliability. Such guarantees are indispensable for real-world AI, where failures can cause physical harm, economic loss, or loss of trust. At the same time, demand for trustworthy AI continues to grow as machine learning is increasingly deployed in human-centered applications. This makes it essential to design RL algorithms that are not only efficient but also reliable, robust, and aligned with societal needs.

Published

2026-03-14

How to Cite

Wei, H. (2026). Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39838–39838. https://doi.org/10.1609/aaai.v40i47.41358