EffConv: Efficient Learning of Kernel Sizes for Convolution Layers of CNNs
DOI:
https://doi.org/10.1609/aaai.v37i6.25923Keywords:
ML: Deep Neural Architectures, ML: Deep Neural Network AlgorithmsAbstract
Determining kernel sizes of a CNN model is a crucial and non-trivial design choice and significantly impacts its performance. The majority of kernel size design methods rely on complex heuristic tricks or leverage neural architecture search that requires extreme computational resources. Thus, learning kernel sizes, using methods such as modeling kernels as a combination of basis functions, jointly with the model weights has been proposed as a workaround. However, previous methods cannot achieve satisfactory results or are inefficient for large-scale datasets. To fill this gap, we design a novel efficient kernel size learning method in which a size predictor model learns to predict optimal kernel sizes for a classifier given a desired number of parameters. It does so in collaboration with a kernel predictor model that predicts the weights of the kernels - given kernel sizes predicted by the size predictor - to minimize the training objective, and both models are trained end-to-end. Our method only needs a small fraction of the training epochs of the original CNN to train these two models and find proper kernel sizes for it. Thus, it offers an efficient and effective solution for the kernel size learning problem. Our extensive experiments on MNIST, CIFAR-10, STL-10, and ImageNet-32 demonstrate that our method can achieve the best training time vs. accuracy trade-off compared to previous kernel size learning methods and significantly outperform them on challenging datasets such as STL-10 and ImageNet-32. Our implementations are available at https://github.com/Alii-Ganjj/EffConv.Downloads
Published
2023-06-26
How to Cite
Ganjdanesh, A., Gao, S., & Huang, H. (2023). EffConv: Efficient Learning of Kernel Sizes for Convolution Layers of CNNs. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 7604-7612. https://doi.org/10.1609/aaai.v37i6.25923
Issue
Section
AAAI Technical Track on Machine Learning I