Deterministic Mini-batch Sequencing for Training Deep Neural Networks
Keywords:(Deep) Neural Network Algorithms
AbstractRecent advancements in the field of deep learning have dramatically improved the performance of machine learning models in a variety of applications, including computer vision, text mining, speech processing and fraud detection among others. Mini-batch gradient descent is the standard algorithm to train deep models, where mini-batches of a fixed size are sampled randomly from the training data and passed through the network sequentially. In this paper, we present a novel algorithm to generate a deterministic sequence of mini-batches to train a deep neural network (rather than a random sequence). Our rationale is to select a mini-batch by minimizing the Maximum Mean Discrepancy (MMD) between the already selected mini-batches and the unselected training samples. We pose the mini-batch selection as a constrained optimization problem and derive a linear programming relaxation to determine the sequence of mini-batches. To the best of our knowledge, this is the first research effort that uses the MMD criterion to determine a sequence of mini-batches to train a deep neural network. The proposed mini-batch sequencing strategy is deterministic and independent of the underlying network architecture and prediction task. Our extensive empirical analyses on three challenging datasets corroborate the merit of our framework over competing baselines. We further study the performance of our framework on two other applications besides classification (regression and semantic segmentation) to validate its generalizability.
How to Cite
Banerjee, S., & Chakraborty, S. (2021). Deterministic Mini-batch Sequencing for Training Deep Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 6723-6731. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16831
AAAI Technical Track on Machine Learning I