Better Peer Grading through Bayesian Inference


  • Hedayat Zarkoob University of British Columbia
  • Greg d'Eon University of British Columbia
  • Lena Podina University of Waterloo University of British Columbia
  • Kevin Leyton-Brown University of British Columbia



HAI: Crowdsourcing, APP: Education, GTEP: Applications, GTEP: Mechanism Design


Peer grading systems aggregate noisy reports from multiple students to approximate a "true" grade as closely as possible. Most current systems either take the mean or median of reported grades; others aim to estimate students’ grading accuracy under a probabilistic model. This paper extends the state of the art in the latter approach in three key ways: (1) recognizing that students can behave strategically (e.g., reporting grades close to the class average without doing the work); (2) appropriately handling censored data that arises from discrete-valued grading rubrics; and (3) using mixed integer programming to improve the interpretability of the grades assigned to students. We demonstrate how to make Bayesian inference practical in this model and evaluate our approach on both synthetic and real-world data obtained by using our implemented system in four large classes. These extensive experiments show that grade aggregation using our model accurately estimates true grades, students' likelihood of submitting uninformative grades, and the variation in their inherent grading error; we also characterize our models' robustness.




How to Cite

Zarkoob, H., d’Eon, G., Podina, L., & Leyton-Brown, K. (2023). Better Peer Grading through Bayesian Inference. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5), 6137-6144.



AAAI Technical Track on Humans and AI