Feng, X., Jiang, Z., Kaufmann, T., Xu, P., Hüllermeier, E., Weng, P., & Zhu, Y. (2025). DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback. Proceedings of the AAAI Conference on Artificial Intelligence, 39(16), 16604–16612. https://doi.org/10.1609/aaai.v39i16.33824