Title:: The training dynamics and local geometry of high-dimensional learning
Speaker: | Aukosh Jagannath |
Affiliation: | University of Waterloo |
Location: | MC 5501 |
Abstract:Many modern data science tasks can be expressed as optimizing a complex, random functions in high dimensions. The go-to methods for such problems are variations of stochastic gradient descent (SGD), which perform remarkably well—c.f. the success of modern neural networks. However, the rigorous analysis of SGD on natural, high-dimensional statistical models is in its infancy. In this talk, we study a general model that captures a broad range of learning tasks, from Matrix and Tensor PCA to training two-layer neural networks to classify mixture models. We show the evolution of natural summary statistics along training converge, in the high-dimensional limit, to a closed, finite-dimensional dynamical system called their effective dynamics. We then turn to understanding the landscape of training from the point-of-view of the algorithm. We show that in this limit, the spectrum of the Hessian and Information matrices admit an effective spectral theory: the limiting empirical spectral measure and outliers have explicit characterizations that depend only on these summary statistics. I will then illustrate how these techniques can be used to give rigorous demonstrations of phenomena observed in the machine learning literature such as the lottery ticket hypothesis and the "spectral alignment" phenomenona. This talk surveys a series of joint works with G. Ben Arous (NYU), R. Gheissari (Northwestern), and J. Huang (U Penn).