[1]
X. Feng, “DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback”, AAAI, vol. 39, no. 16, pp. 16604–16612, Apr. 2025.