Supervision Interpolation via LossMix: Generalizing Mixup for Object Detection and Beyond

Authors

  • Thanh Vu University of North Carolina at Chapel Hill Mineral
  • Baochen Sun Mineral
  • Bodi Yuan Mineral
  • Alex Ngai Mineral
  • Yueqi Li Mineral
  • Jan-Michael Frahm University of North Carolina at Chapel Hill

DOI:

https://doi.org/10.1609/aaai.v38i6.28335

Keywords:

CV: Object Detection & Categorization, CV: Learning & Optimization for CV, CV: Vision for Robotics & Autonomous Driving, ML: Classification and Regression, ML: Deep Learning Algorithms, ML: Transfer, Domain Adaptation, Multi-Task Learning

Abstract

The success of data mixing augmentations in image classification tasks has been well-received. However, these techniques cannot be readily applied to object detection due to challenges such as spatial misalignment, foreground/background distinction, and plurality of instances. To tackle these issues, we first introduce a novel conceptual framework called Supervision Interpolation (SI), which offers a fresh perspective on interpolation-based augmentations by relaxing and generalizing Mixup. Based on SI, we propose LossMix, a simple yet versatile and effective regularization that enhances the performance and robustness of object detectors and more. Our key insight is that we can effectively regularize the training on mixed data by interpolating their loss errors instead of ground truth labels. Empirical results on the PASCAL VOC and MS COCO datasets demonstrate that LossMix can consistently outperform state-of-the-art methods widely adopted for detection. Furthermore, by jointly leveraging LossMix with unsupervised domain adaptation, we successfully improve existing approaches and set a new state of the art for cross-domain object detection.

Published

2024-03-24

How to Cite

Vu, T., Sun, B., Yuan, B., Ngai, A., Li, Y., & Frahm, J.-M. (2024). Supervision Interpolation via LossMix: Generalizing Mixup for Object Detection and Beyond. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5280–5288. https://doi.org/10.1609/aaai.v38i6.28335

Issue

Section

AAAI Technical Track on Computer Vision V