Active Learning with Query Generation for Cost-Effective Text Classification

Yi-Fan Yan; Sheng-Jun Huang; Shaoyi Chen; Meng Liao; Jin Xu

doi:10.1609/aaai.v34i04.6133

Authors

Yi-Fan Yan NUAA
Sheng-Jun Huang NUAA
Shaoyi Chen Tencent Inc.
Meng Liao Tencent Inc.
Jin Xu Tencent Inc.

DOI:

https://doi.org/10.1609/aaai.v34i04.6133

Abstract

Labeling a text document is usually time consuming because it requires the annotator to read the whole document and check its relevance with each possible class label. It thus becomes rather expensive to train an effective model for text classification when it involves a large dataset of long documents. In this paper, we propose an active learning approach for text classification with lower annotation cost. Instead of scanning all the examples in the unlabeled data pool to select the best one for query, the proposed method automatically generates the most informative examples based on the classification model, and thus can be applied to tasks with large scale or even infinite unlabeled data. Furthermore, we propose to approximate the generated example with a few summary words by sparse reconstruction, which allows the annotators to easily assign the class label by reading a few words rather than the long document. Experiments on different datasets demonstrate that the proposed approach can effectively improve the classification performance while significantly reduce the annotation cost.

Active Learning with Query Generation for Cost-Effective Text Classification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information