The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training

Joshua DeOliveira; Walter Gerych; Elke Rundensteiner

doi:10.1609/aaai.v39i15.33786

Authors

Joshua DeOliveira Worcester Polytechnic Institute, Worcester, MA
Walter Gerych Massachusetts Institute of Technology, Cambridge, MA
Elke Rundensteiner Worcester Polytechnic Institute, Worcester, MA

DOI:

https://doi.org/10.1609/aaai.v39i15.33786

Abstract

Developments in deep neural nets have trended towards increasingly larger overparameterized architectures, resulting in lengthy training sessions with ever more elusive training dynamics. Thus, ensuring these models learn accurate generalizable representations of data efficiently is challenging. Previous works have developed specialized techniques from data-pruning, architecture selection, pseudo-label generation, bias identification, or label refurbishment to improve downstream training. Problematically, most methods require prohibitively expensive iterative model training. In this paper, we demonstrate that we can exploit the recent neural tangent kernel (NTK) theory for understanding and improving model training behavior before ever training a model. First, we show a powerful signal derived from the NTK theory can be computed remarkably fast. We then leverage this signal for the design of a unified suite of surprisingly effective tools for the four important tasks of architecture selection, pseudo-label verification, bias identification, and label refurbishment, all requiring zero model training.

The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information