Which Is More Effective in Label Noise Cleaning, Correction or Filtering?

Authors

  • Gaoxia Jiang Shanxi University
  • Jia Zhang Shanxi University
  • Xuefei Bai Shanxi University
  • Wenjian Wang Shanxi University
  • Deyu Meng Xi'an Jiaotong University

DOI:

https://doi.org/10.1609/aaai.v38i11.29183

Keywords:

ML: Deep Learning Algorithms, ML: Classification and Regression

Abstract

Most noise cleaning methods adopt one of the correction and filtering modes to build robust models. However, their effectiveness, applicability, and hyper-parameter insensitivity have not been carefully studied. We compare the two cleaning modes via a rebuilt error bound in noisy environments. At the dataset level, Theorem 5 implies that correction is more effective than filtering when the cleaned datasets have close noise rates. At the sample level, Theorem 6 indicates that confident label noises (large noise probabilities) are more suitable to be corrected, and unconfident noises (medium noise probabilities) should be filtered. Besides, an imperfect hyper-parameter may have fewer negative impacts on filtering than correction. Unlike existing methods with a single cleaning mode, the proposed Fusion cleaning framework of Correction and Filtering (FCF) combines the advantages of different modes to deal with diverse suspicious labels. Experimental results demonstrate that our FCF method can achieve state-of-the-art performance on benchmark datasets.

Published

2024-03-24

How to Cite

Jiang, G., Zhang, J., Bai, X., Wang, W., & Meng, D. (2024). Which Is More Effective in Label Noise Cleaning, Correction or Filtering?. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12866-12873. https://doi.org/10.1609/aaai.v38i11.29183

Issue

Section

AAAI Technical Track on Machine Learning II