Feng, Xuening, Zhaohui Jiang, Timo Kaufmann, Puchen Xu, Eyke Hüllermeier, Paul Weng, and Yifei Zhu. “DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback”. Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 16 (April 11, 2025): 16604–16612. Accessed May 8, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/33824.