Chen, Yiding, Xuezhou Zhang, Qiaomin Xie, and Xiaojin Zhu. “Exact Policy Recovery in Offline RL With Both Heavy-Tailed Rewards and Data Corruption”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 10 (March 24, 2024): 11416-11424. Accessed April 24, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/29022.