From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project

Authors

  • Peter Clark Allen Institute for AI
  • Oren Etzioni Allen Institute for AI
  • Tushar Khot Allen Institute for AI
  • Daniel Khashabi Allen Institute for AI
  • Bhavana Dalvi Mishra Allen Institute for AI
  • Kyle Richardson Allen Institute for AI
  • Ashish Sabharwal Allen Institute for AI
  • Carissa Schoenick Allen Institute for AI
  • Carissa Schoenick Allen Institute for AI
  • Oyvind Tafjord Allen Institute for AI
  • Niket Tandon Allen Institute for AI
  • Sumithra Bhakthavatsalam Allen Institute for AI
  • Dirk Groeneveld Allen Institute for AI
  • Michal Guerquin Allen Institute for AI
  • Michael Schmitz Allen Institute for AI

DOI:

https://doi.org/10.1609/aimag.v41i4.5304

Abstract

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even as recently as 2016, the best AI system could achieve merely 59.3 percent on an 8th grade science exam. This article reports success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90 percent on the exam’s nondiagram, multiple choice (NDMC) questions. In addition, our Aristo system, building upon the success of recent language models, exceeded 83 percent on the corresponding Grade 12 Science Exam NDMC questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern natural language processing methods can result in mastery on this task. While not a full solution to general question-answering (the questions are limited to 8th grade multiple-choice science) it represents a significant milestone for the field.

Downloads

Published

2020-12-28

How to Cite

Clark, P., Etzioni, O., Khot, T., Khashabi, D., Mishra, B., Richardson, K., Sabharwal, A., Schoenick, C., Schoenick, C., Tafjord, O. ., Tandon, N., Bhakthavatsalam, S., Groeneveld, D., Guerquin, M., & Schmitz, M. (2020). From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project. AI Magazine, 41(4), 39-53. https://doi.org/10.1609/aimag.v41i4.5304

Issue

Section

Special Topic Articles