Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications

Honghao Wei

doi:10.1609/aaai.v40i47.41358

Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications

Authors

Honghao Wei Washington State University

DOI:

https://doi.org/10.1609/aaai.v40i47.41358

Abstract

Safe reinforcement learning (RL) has emerged as a key paradigm for deploying AI in high-stakes domains such as autonomous driving, robotics, healthcare, and recommender systems. By embedding constraints into the learning process, safe RL enables agents to optimize performance while satisfying critical requirements, including collision avoidance, resource limits, and system reliability. Such guarantees are indispensable for real-world AI, where failures can cause physical harm, economic loss, or loss of trust. At the same time, demand for trustworthy AI continues to grow as machine learning is increasingly deployed in human-centered applications. This makes it essential to design RL algorithms that are not only efficient but also reliable, robust, and aligned with societal needs.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Wei, H. (2026). Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39838–39838. https://doi.org/10.1609/aaai.v40i47.41358

Download Citation

Issue

Vol. 40 No. 47: AAAI-26 New Faculty Highlights, Journal Track, IAAI-26 and EAAI-26 Main Track

Section

New Faculty Highlights

Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information