Optimal and Efficient Binary Questioning for Accelerated Annotation

Authors

  • Franco Marchesoni-Acland SLB École Normale Supérieure Paris-Saclay
  • Jean-Michel Morel City University of Hong Kong
  • Josselin Kherroubi SLB
  • Gabriele Facciolo École Normale Supérieure Paris-Saclay

DOI:

https://doi.org/10.1609/aaai.v39i13.33570

Abstract

Even though data annotation is extremely important for interpretability, research, and development of artificial intelligence solutions, annotating data remains costly. Research efforts such as active learning or few-shot learning alleviate the cost by increasing sample efficiency, yet the problem of annotating data more quickly has received comparatively little attention. Leveraging a predictor has been shown to reduce annotation cost in practice but has not been theoretically considered. We ask the following question: to annotate a binary classification dataset with N samples, can the annotator answer less than N yes/no questions? Framing this question-and-answer (Q&A) game as an optimal encoding problem, we find a positive answer given by the Huffman encoding of the possible labelings. Unfortunately, the algorithm is computationally intractable even for small dataset sizes. As a practical method, we propose to minimize a cost function a few steps ahead, similarly to lookahead minimization in optimal control. This solution is analyzed, compared with the optimal one, and evaluated using several synthetic and real-world datasets. The method allows a significant improvement (23-86%) in the annotation efficiency of real-world datasets.

Published

2025-04-11

How to Cite

Marchesoni-Acland, F., Morel, J.-M., Kherroubi, J., & Facciolo, G. (2025). Optimal and Efficient Binary Questioning for Accelerated Annotation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(13), 14336–14343. https://doi.org/10.1609/aaai.v39i13.33570

Issue

Section

AAAI Technical Track on Humans and AI