Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Zhouhong Gu; Xiaoxuan Zhu; Haoning Ye; Lin Zhang; Jianchen Wang; Yixin Zhu; Sihang Jiang; Zhuozhi Xiong; Zihan Li; Weijie Wu; Qianyu He; Rui Xu; Wenhao Huang; Jingping Liu; Zili Wang; Shusen Wang; Weiguo Zheng; Hongwei Feng; Yanghua Xiao

doi:10.1609/aaai.v38i16.29767

Authors

Zhouhong Gu Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Xiaoxuan Zhu Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Haoning Ye Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Lin Zhang Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Jianchen Wang Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Yixin Zhu Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Sihang Jiang Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Zhuozhi Xiong Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Zihan Li Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Weijie Wu Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Qianyu He Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Rui Xu Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Wenhao Huang Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Jingping Liu School of Information Science and Engineering, East China University of Science and Technology
Zili Wang Xiaohongshu Inc
Shusen Wang Xiaohongshu Inc
Weiguo Zheng School of Data Science, Fudan University
Hongwei Feng Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China
Yanghua Xiao Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, China Fudan-Aishu Cognitive Intelligence Joint Research Center

DOI:

https://doi.org/10.1609/aaai.v38i16.29767

Keywords:

NLP: (Large) Language Models, NLP: Applications

Abstract

New Natural Langauge Process~(NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge.Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi-Specialty with 14,041 questions and Xiezhi-Interdiscipline with 10,746 questions. We conduct evaluation of the 47 cutting-edge LLMs on Xiezhi. Results indicate that LLMs exceed average performance of humans in science, engineering, agronomy, medicine, and art, but fall short in economics, jurisprudence, pedagogy, literature, history, and management. All the evaluation code and data are open sourced in https://github.com/MikeGu721/XiezhiBenchmark

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription