[1]
G. Recchia, C. S. Mangat, I. Li, and G. Krishnakumar, “FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research”, AAAI, vol. 40, no. 44, pp. 37867–37876, Mar. 2026.