Improving Quantification with Minimal In-Domain Annotations: Beyond Classify and Count

Pius von Däniken; Jan Milan Deriu; Alvaro Rodrigo; Mark Cieliebak

doi:10.1609/icwsm.v18i1.31411

Improving Quantification with Minimal In-Domain Annotations: Beyond Classify and Count

Authors

Pius von Däniken Zurich University of Applied Sciences
Jan Milan Deriu Zurich University of Applied Sciences
Alvaro Rodrigo NLP&IR Group at UNED
Mark Cieliebak Zurich University of Applied Sciences

DOI:

https://doi.org/10.1609/icwsm.v18i1.31411

Abstract

Quantification is the task of estimating the class distribution in a given collection. With the growing availability of classification models, the use of classifiers for quantification has become increasingly popular, carrying the promise of eliminating the need for manual annotation. However, the naive classify and count approach presents clear limitations, especially evident in the face of domain discrepancies. In this work, we introduce two novel quantification methods, called CPCC and BCC, which can adapt to new target datasets with a small number of annotated in-domain samples (N = 100). To explore their real-world applicability, we apply our methods to a range of quantification tasks in the realm of hateful and offensive language, where they perform markedly better than classify and count and other existing methods.

Downloads

Published

2024-05-28

How to Cite

von Däniken, P., Deriu, J. M., Rodrigo, A., & Cieliebak, M. (2024). Improving Quantification with Minimal In-Domain Annotations: Beyond Classify and Count. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 1585-1598. https://doi.org/10.1609/icwsm.v18i1.31411

Download Citation

Issue

Vol. 18 (2024): Proceedings of the Eighteenth International AAAI Conference on Web and Social Media

Section

Full Papers

Improving Quantification with Minimal In-Domain Annotations: Beyond Classify and Count

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information