CEMA – Cost-Efficient Machine-Assisted Document Annotations

Guowen Yuan; Ben Kao; Tien-Hsuan Wu

doi:10.1609/aaai.v37i9.26308

CEMA – Cost-Efficient Machine-Assisted Document Annotations

Authors

Guowen Yuan University of Hong Kong
Ben Kao University of Hong Kong
Tien-Hsuan Wu University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v37i9.26308

Keywords:

ML: Active Learning, SNLP: Syntax -- Tagging, Chunking & Parsing

Abstract

We study the problem of semantically annotating textual documents that are complex in the sense that the documents are long, feature rich, and domain specific. Due to their complexity, such annotation tasks require trained human workers, which are very expensive in both time and money. We propose CEMA, a method for deploying machine learning to assist humans in complex document annotation. CEMA estimates the human cost of annotating each document and selects the set of documents to be annotated that strike the best balance between model accuracy and human cost. We conduct experiments on complex annotation tasks in which we compare CEMA against other document selection and annotation strategies. Our results show that CEMA is the most cost-efficient solution for those tasks.

Downloads

Published

2023-06-26

How to Cite

Yuan, G., Kao, B., & Wu, T.-H. (2023). CEMA – Cost-Efficient Machine-Assisted Document Annotations. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 11043–11050. https://doi.org/10.1609/aaai.v37i9.26308

Download Citation

Issue

Vol. 37 No. 9: AAAI-23 Technical Tracks 9

Section

AAAI Technical Track on Machine Learning IV

CEMA – Cost-Efficient Machine-Assisted Document Annotations

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information