The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training

Authors

  • Joshua DeOliveira Worcester Polytechnic Institute, Worcester, MA
  • Walter Gerych Massachusetts Institute of Technology, Cambridge, MA
  • Elke Rundensteiner Worcester Polytechnic Institute, Worcester, MA

DOI:

https://doi.org/10.1609/aaai.v39i15.33786

Abstract

Developments in deep neural nets have trended towards increasingly larger overparameterized architectures, resulting in lengthy training sessions with ever more elusive training dynamics. Thus, ensuring these models learn accurate generalizable representations of data efficiently is challenging. Previous works have developed specialized techniques from data-pruning, architecture selection, pseudo-label generation, bias identification, or label refurbishment to improve downstream training. Problematically, most methods require prohibitively expensive iterative model training. In this paper, we demonstrate that we can exploit the recent neural tangent kernel (NTK) theory for understanding and improving model training behavior before ever training a model. First, we show a powerful signal derived from the NTK theory can be computed remarkably fast. We then leverage this signal for the design of a unified suite of surprisingly effective tools for the four important tasks of architecture selection, pseudo-label verification, bias identification, and label refurbishment, all requiring zero model training.

Downloads

Published

2025-04-11

How to Cite

DeOliveira, J., Gerych, W., & Rundensteiner, E. (2025). The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training. Proceedings of the AAAI Conference on Artificial Intelligence, 39(15), 16262-16270. https://doi.org/10.1609/aaai.v39i15.33786

Issue

Section

AAAI Technical Track on Machine Learning I