From Label Smoothing to Label Relaxation
Keywords:Classification and Regression, Learning Theory, (Deep) Neural Network Learning Theory
AbstractRegularization of (deep) learning models can be realized at the model, loss, or data level. As a technique somewhere in-between loss and data, label smoothing turns deterministic class labels into probability distributions, for example by uniformly distributing a certain part of the probability mass over all classes. A predictive model is then trained on these distributions as targets, using cross-entropy as loss function. While this method has shown improved performance compared to non-smoothed cross-entropy, we argue that the use of a smoothed though still precise probability distribution as a target can be questioned from a theoretical perspective. As an alternative, we propose a generalized technique called label relaxation, in which the target is a set of probabilities represented in terms of an upper probability distribution. This leads to a genuine relaxation of the target instead of a distortion, thereby reducing the risk of incorporating an undesirable bias in the learning process. Methodically, label relaxation leads to the minimization of a novel type of loss function, for which we propose a suitable closed-form expression for model optimization. The effectiveness of the approach is demonstrated in an empirical study on image data.
How to Cite
Lienen, J., & Hüllermeier, E. (2021). From Label Smoothing to Label Relaxation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8583-8591. https://doi.org/10.1609/aaai.v35i10.17041
AAAI Technical Track on Machine Learning III