CFEVER: A Chinese Fact Extraction and VERification Dataset

Ying-Jia Lin; Chun-Yi Lin; Chia-Jen Yeh; Yi-Ting Li; Yun-Yu Hu; Chih-Hao Hsu; Mei-Feng Lee; Hung-Yu Kao

doi:10.1609/aaai.v38i17.29825

Authors

Ying-Jia Lin Department of Computer Science and Information Engineering, National Cheng Kung University
Chun-Yi Lin Department of Computer Science and Information Engineering, National Cheng Kung University
Chia-Jen Yeh Department of Computer Science and Information Engineering, National Cheng Kung University
Yi-Ting Li Department of Computer Science and Information Engineering, National Cheng Kung University
Yun-Yu Hu Department of Computer Science and Information Engineering, National Cheng Kung University
Chih-Hao Hsu Department of Computer Science and Information Engineering, National Cheng Kung University
Mei-Feng Lee Department of Computer Science and Information Engineering, National Cheng Kung University
Hung-Yu Kao Department of Computer Science and Information Engineering, National Cheng Kung University

DOI:

https://doi.org/10.1609/aaai.v38i17.29825

Keywords:

NLP: Sentence-level Semantics, Textual Inference, etc., DMKM: Conversational Systems for Recommendation & Retrieval, NLP: Applications, NLP: (Large) Language Models

Abstract

We present CFEVER, a Chinese dataset designed for Fact Extraction and VERification. CFEVER comprises 30,012 manually created claims based on content in Chinese Wikipedia. Each claim in CFEVER is labeled as “Supports”, “Refutes”, or “Not Enough Info” to depict its degree of factualness. Similar to the FEVER dataset, claims in the “Supports” and “Refutes” categories are also annotated with corresponding evidence sentences sourced from single or multiple pages in Chinese Wikipedia. Our labeled dataset holds a Fleiss’ kappa value of 0.7934 for five-way inter-annotator agreement. In addition, through the experiments with the state-of-the-art approaches developed on the FEVER dataset and a simple baseline for CFEVER, we demonstrate that our dataset is a new rigorous benchmark for factual extraction and verification, which can be further used for developing automated systems to alleviate human fact-checking efforts. CFEVER is available at https://ikmlab.github.io/CFEVER.

CFEVER: A Chinese Fact Extraction and VERification Dataset

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information