Recchia, Gabriel, et al. “FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 44, Mar. 2026, pp. 37867-76, doi:10.1609/aaai.v40i44.41123.