LLMs in Automated Essay Evaluation: A Case Study

Authors

  • Milan Kostic University of Camerino
  • Hans Friedrich Witschel FHNW University of Applied Sciences and Arts Northwestern Switzerland
  • Knut Hinkelmann FHNW University of Applied Sciences and Arts Northwestern Switzerland University of Camerino
  • Maja Spahic-Bogdanovic FHNW University of Applied Sciences and Arts Northwestern Switzerland University of Camerino

DOI:

https://doi.org/10.1609/aaaiss.v3i1.31193

Keywords:

Large Language Models, Automatic Essay Evaluation, Assignment Evaluation, Higher Education

Abstract

This study delves into the application of large language models (LLMs), such as ChatGPT-4, for the automated evaluation of student essays, with a focus on a case study conducted at the Swiss Institute of Business Administration. It explores the effectiveness of LLMs in assessing German-language student transfer assignments, and contrasts their performance with traditional evaluations by human lecturers. The primary findings highlight the challenges faced by LLMs in terms of accurately grading complex texts according to predefined categories and providing detailed feedback. This research illuminates the gap between the capabilities of LLMs and the nuanced requirements of student essay evaluation. The conclusion emphasizes the necessity for ongoing research and development in the area of LLM technology to improve the accuracy, reliability, and consistency of automated essay assessments in educational contexts.

Downloads

Published

2024-05-20

Issue

Section

Empowering Machine Learning and Large Language Models with Domain and Commonsense Knowledge