Recchia, G., Mangat, C. S., Li, I., & Krishnakumar, G. (2026). FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research. Proceedings of the AAAI Conference on Artificial Intelligence, 40(44), 37867–37876. https://doi.org/10.1609/aaai.v40i44.41123