Feng, Xuening, Zhaohui Jiang, Timo Kaufmann, Puchen Xu, Eyke Hüllermeier, Paul Weng, and Yifei Zhu. 2025. “DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback”. Proceedings of the AAAI Conference on Artificial Intelligence 39 (16):16604-12. https://doi.org/10.1609/aaai.v39i16.33824.