Probability seminar series
Elliot
Paquette Room: M3 3127 |
The homogenization of SGD in high dimensions
Stochastic gradient descent (SGD) is one of the most, if not the most, influential optimization algorithms in use today.
It is the subject of extensive empirical and theoretical research, principally in justifying its performance at minimizing very large (high dimensional) nonlinear optimization. This talk is about the precise high—dimensional limit behavior (specifically generalization and training dynamics) of SGD in a high—dimensional least squares problems. High dimensionality is enforced by a family of resolvent conditions on the data matrix, and data-target pair, which can be viewed as a type of eigenvector delocalization. We show that the trajectory of SGD is quantitively close to the solution of a stochastic differential equation, which we call homogenized SGD, and whose behavior is explicitly solvable using renewal theory and the spectrum of the data.
Based on joint works with Courtney Paquette and Kiwon Lee (McGill), and Fabian Pedregosa, Jeffrey Pennington and Ben Adlam (Google Brain).