Feng, Xuening, et al. “DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 16, Apr. 2025, pp. 16604-12, doi:10.1609/aaai.v39i16.33824.