Understanding and Leveraging the Learning Phases of Neural Networks
DOI:
https://doi.org/10.1609/aaai.v38i13.29408Keywords:
ML: Deep Learning Theory, ML: Transfer, Domain Adaptation, Multi-Task LearningAbstract
The learning dynamics of deep neural networks are not well understood. The information bottleneck (IB) theory proclaimed separate fitting and compression phases. But they have since been heavily debated. We comprehensively analyze the learning dynamics by investigating a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We empirically show the existence of three phases using common datasets and architectures such as ResNet and VGG: (i) near constant reconstruction loss, (ii) decrease, and (iii) increase. We also derive an empirically grounded data model and prove the existence of phases for single-layer networks. Technically, our approach leverages classical complexity analysis. It differs from IB by relying on measuring reconstruction loss rather than information theoretic measures to relate information of intermediate layers and inputs. Our work implies a new best practice for transfer learning: We show empirically that the pre-training of a classifier should stop well before its performance is optimal.Downloads
Published
2024-03-24
How to Cite
Schneider, J., & Prabhushankar, M. (2024). Understanding and Leveraging the Learning Phases of Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 38(13), 14886-14893. https://doi.org/10.1609/aaai.v38i13.29408
Issue
Section
AAAI Technical Track on Machine Learning IV