Xu, Haoran, Xianyuan Zhan, and Xiangyu Zhu. 2022. “Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning”. Proceedings of the AAAI Conference on Artificial Intelligence 36 (8):8753-60. https://doi.org/10.1609/aaai.v36i8.20855.