Local Regularizer Improves Generalization
Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https://github.com/huiqu18/LRSGD.