Training Deep Neural Networks with Virtual Smoothing Classes

Authors

  • Zhiyang Zhou Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China University of Chinese Academy of Sciences, Beijing, China
  • Siwei Wei Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China University of Chinese Academy of Sciences, Beijing, China
  • Xudong Zhang Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China University of Chinese Academy of Sciences, Beijing, China
  • Wensheng Dou Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China University of Chinese Academy of Sciences, Beijing, China Nanjing Institute of Software Technology, University of Chinese Academy of Sciences, Nanjing, China
  • Muzi Qu Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China University of Chinese Academy of Sciences, Beijing, China
  • Yan Cai Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i21.34467

Abstract

Learning with softmax cross-entropy on one-hot labels often leads to overconfidence on the correct class. While label smoothing regulates this overconfidence by redistributing some confidence from the correct class to other incorrect classes, it compromises the representation in the logits about the similarity between samples of different classes and may hurt calibration if higher confidence is required for high accuracy. To overcome these limitations, we propose a Virtual Smoothing (VS) label that redistributes certain confidence from the correct class to additional VS classes to regularize overconfidence. In VS labels, the VS class nodes act as adversaries to the original class nodes, enforcing regularization by clustering samples across all classes. The zero confidence assigned to each incorrect class also allows the incorrect logits to be different from each other without erasing information about sample similarities. The prediction probability can still approach 1 when applying softmax to the logits of the original real classes, which avoids harming but consistently improves calibration. Experiments show that VS labels consistently improve accuracy and calibration while providing better logits for improved knowledge distillation. Additionally, VS labels exhibit effectiveness in improving adversarial training, robust distillation, and out-of-distribution detection.

Downloads

Published

2025-04-11

How to Cite

Zhou, Z., Wei, S., Zhang, X., Dou, W., Qu, M., & Cai, Y. (2025). Training Deep Neural Networks with Virtual Smoothing Classes. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 23036–23044. https://doi.org/10.1609/aaai.v39i21.34467

Issue

Section

AAAI Technical Track on Machine Learning VII