Improving Quantification with Minimal In-Domain Annotations: Beyond Classify and Count

Authors

  • Pius von Däniken Zurich University of Applied Sciences
  • Jan Milan Deriu Zurich University of Applied Sciences
  • Alvaro Rodrigo NLP&IR Group at UNED
  • Mark Cieliebak Zurich University of Applied Sciences

DOI:

https://doi.org/10.1609/icwsm.v18i1.31411

Abstract

Quantification is the task of estimating the class distribution in a given collection. With the growing availability of classification models, the use of classifiers for quantification has become increasingly popular, carrying the promise of eliminating the need for manual annotation. However, the naive classify and count approach presents clear limitations, especially evident in the face of domain discrepancies. In this work, we introduce two novel quantification methods, called CPCC and BCC, which can adapt to new target datasets with a small number of annotated in-domain samples (N = 100). To explore their real-world applicability, we apply our methods to a range of quantification tasks in the realm of hateful and offensive language, where they perform markedly better than classify and count and other existing methods.

Downloads

Published

2024-05-28

How to Cite

von Däniken, P., Deriu, J. M., Rodrigo, A., & Cieliebak, M. (2024). Improving Quantification with Minimal In-Domain Annotations: Beyond Classify and Count. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 1585-1598. https://doi.org/10.1609/icwsm.v18i1.31411