EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing

Fan Gao; Dongyuan Li; Ding Xia; Fei Mi; Yasheng Wang; Lifeng Shang; Baojun Wang

doi:10.1609/aaai.v40i44.41072

Authors

Fan Gao Huawei Technologies Ltd. The University of Tokyo
Dongyuan Li The University of Tokyo
Ding Xia The University of Tokyo
Fei Mi Huawei Technologies Ltd.
Yasheng Wang Huawei Technologies Ltd.
Lifeng Shang Huawei Technologies Ltd.
Baojun Wang Huawei Technologies Ltd.

DOI:

https://doi.org/10.1609/aaai.v40i44.41072

Abstract

Prompt-based essay writing is an effective and common way to assess students' critical thinking skills. Recent work has evaluated the impressive capabilities of Large Language Models (LLMs) on this task. However, most studies focus primarily on English. Those examining LLMs' performance in Chinese often rely on coarse-grained text quality metrics, overlooking the structural and rhetorical complexities of Chinese essays, particularly across diverse genres. We therefore propose EssayBench, a multi-genre benchmark specifically designed for Chinese essay writing, along with a fine-grained, genre-specific scoring framework that hierarchically aggregates scores to better align with human preferences. The dataset comprises 728 real-world prompts across four major genres (Argumentative, Narrative, Descriptive, and Expository), and includes both Open-Ended and Constrained types. Our evaluation protocol is validated through a comprehensive human agreement study. The results show that our protocol aligns well with human judgments, achieving a highest Spearman's correlation of 0.816 and outperforming coarse-grained evaluation methods by an average of 8.6\%. Finally, we benchmark 15 large LLMs, analyzing their strengths and limitations across genres and instruction types. We believe EssayBench offers a more reliable framework for evaluating Chinese essay generation and provides valuable insights for improving LLMs in this domain.

EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information