Learning Across Scales---Multiscale Methods for Convolution Neural Networks

Authors

  • Eldad Haber University of British Columbia, Vancouver, BC; Xtract Technologies, Vancouver, BC
  • Lars Ruthotto Emory University, Atlanta, GA; Xtract Technologies, Vancouver, BC
  • Elliot Holtham Xtract Technologies, Vancouver, BC
  • Seong-Hwan Jun University of British Columbia, Vancouver, Canada

DOI:

https://doi.org/10.1609/aaai.v32i1.11680

Keywords:

Deep Learning, Convolution Neural Networks, Optimal Control, Partial Differential Equations

Abstract

In this work, we establish the relation between optimal control and training deep Convolution Neural Networks (CNNs). We show that the forward propagation in CNNs can be interpreted as a time-dependent nonlinear differential equation and learning can be seen as controlling the parameters of the differential equation such that the network approximates the data-label relation for given training data. Using this continuous interpretation, we derive two new methods to scale CNNs with respect to two different dimensions. The first class of multiscale methods connects low-resolution and high-resolution data using prolongation and restriction of CNN parameters inspired by algebraic multigrid techniques. We demonstrate that our method enables classifying high-resolution images using CNNs trained with low-resolution images and vice versa and warm-starting the learning process. The second class of multiscale methods connects shallow and deep networks and leads to new training strategies that gradually increase the depths of the CNN while re-using parameters for initializations.

Downloads

Published

2018-04-29

How to Cite

Haber, E., Ruthotto, L., Holtham, E., & Jun, S.-H. (2018). Learning Across Scales---Multiscale Methods for Convolution Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11680