Generative Calibration of Inaccurate Annotation for Label Distribution Learning

Authors

  • Liang He School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
  • Yunan Lu School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, China
  • Weiwei Li College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
  • Xiuyi Jia School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, China

DOI:

https://doi.org/10.1609/aaai.v38i11.29131

Keywords:

ML: Multi-class/Multi-label Learning & Extreme Classification

Abstract

Label distribution learning (LDL) is an effective learning paradigm for handling label ambiguity. When applying LDL, it typically requires datasets annotated with label distributions. However, obtaining supervised data for LDL is a challenging task. Due to the randomness of label annotation, the annotator can produce inaccurate annotation results for the instance, affecting the accuracy and generalization ability of the LDL model. To address this problem, we propose a generative approach to calibrate the inaccurate annotation for LDL using variational inference techniques. Specifically, we assume that instances with similar features share latent similar label distributions. The feature vectors and label distributions are generated by Gaussian mixture and Dirichlet mixture, respectively. The relationship between them is established through a shared categorical variable, which effectively utilizes the label distribution of instances with similar features, and achieves a more accurate label distribution through the generative approach. Furthermore, we use a confusion matrix to model the factors that contribute to the inaccuracy during the annotation process, which captures the relationship between label distributions and inaccurate label distributions. Finally, the label distribution is used to calibrate the available information in the noisy dataset to obtain the ground-truth label distribution.

Published

2024-03-24

How to Cite

He, L., Lu, Y., Li, W., & Jia, X. (2024). Generative Calibration of Inaccurate Annotation for Label Distribution Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12394–12401. https://doi.org/10.1609/aaai.v38i11.29131

Issue

Section

AAAI Technical Track on Machine Learning II