Robust Visual Recognition with Class-Imbalanced Open-World Noisy Data

Authors

  • Na Zhao Singapore University of Technology and Design
  • Gim Hee Lee National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v38i15.29642

Keywords:

ML: Classification and Regression, CV: Object Detection & Categorization, CV: Adversarial Attacks & Robustness

Abstract

Learning from open-world noisy data, where both closed-set and open-set noise co-exist in the dataset, is a realistic but underexplored setting. Only recently, several efforts have been initialized to tackle this problem. However, these works assume the classes are balanced when dealing with open-world noisy data. This assumption often violates the nature of real-world large-scale datasets, where the label distributions are generally long-tailed, i.e. class-imbalanced. In this paper, we study the problem of robust visual recognition with class-imbalanced open-world noisy data. We propose a probabilistic graphical model-based approach: iMRF to achieve label noise correction that is robust to class imbalance via an efficient iterative inference of a Markov Random Field (MRF) in each training mini-batch. Furthermore, we design an agreement-based thresholding strategy to adaptively collect clean samples from all classes that includes corrected closed-set noisy samples while rejecting open-set noisy samples. We also introduce a noise-aware balanced cross-entropy loss to explicitly eliminate the bias caused by class-imbalanced data. Extensive experiments on several benchmark datasets including synthetic and real-world noisy datasets demonstrate the superior performance robustness of our method over existing methods. Our code is available at https://github.com/Na-Z/LIOND.

Published

2024-03-24

How to Cite

Zhao, N., & Lee, G. H. (2024). Robust Visual Recognition with Class-Imbalanced Open-World Noisy Data. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16989-16997. https://doi.org/10.1609/aaai.v38i15.29642

Issue

Section

AAAI Technical Track on Machine Learning VI