Optimal and Efficient Binary Questioning for Accelerated Annotation

Franco Marchesoni-Acland; Jean-Michel Morel; Josselin Kherroubi; Gabriele Facciolo

doi:10.1609/aaai.v39i13.33570

Authors

Franco Marchesoni-Acland SLB École Normale Supérieure Paris-Saclay
Jean-Michel Morel City University of Hong Kong
Josselin Kherroubi SLB
Gabriele Facciolo École Normale Supérieure Paris-Saclay

DOI:

https://doi.org/10.1609/aaai.v39i13.33570

Abstract

Even though data annotation is extremely important for interpretability, research, and development of artificial intelligence solutions, annotating data remains costly. Research efforts such as active learning or few-shot learning alleviate the cost by increasing sample efficiency, yet the problem of annotating data more quickly has received comparatively little attention. Leveraging a predictor has been shown to reduce annotation cost in practice but has not been theoretically considered. We ask the following question: to annotate a binary classification dataset with N samples, can the annotator answer less than N yes/no questions? Framing this question-and-answer (Q&A) game as an optimal encoding problem, we find a positive answer given by the Huffman encoding of the possible labelings. Unfortunately, the algorithm is computationally intractable even for small dataset sizes. As a practical method, we propose to minimize a cost function a few steps ahead, similarly to lookahead minimization in optimal control. This solution is analyzed, compared with the optimal one, and evaluated using several synthetic and real-world datasets. The method allows a significant improvement (23-86%) in the annotation efficiency of real-world datasets.

Optimal and Efficient Binary Questioning for Accelerated Annotation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information