Probability seminar series
Margalit Glasgow
MIT
Room: M3 3127
Propagation of Chaos in 2-Layer Neural Networks beyond Logarithmic Time
The analysis of gradient descent in neural networks remains an outstanding challenge, even for the simplest shallow architectures. In this talk, we'll investigate the gradient dynamics in 2-layer neural networks through the lens of the infinite-width "mean field" limit of neural networks. The infinite-width limit offers analytical simplicity and can help in understanding the role of overparameterization and the scaling behavior of NNs. Yet showing that practically (polynomially) sized neural networks well-approximate their mean field limits throughout training (the called Propagation of Chaos phenomenon) is difficult for the long training times characteristic in practice. We provide a novel analysis that goes beyond traditional Gronwall-based Propagation of Chaos by exploiting certain geometric properties of the optimization landscape, and apply these results to representative models such as single-index models, establishing polynomial learning guarantees.
Joint work with Joan Bruna and Denny Wu.