MC 5501 and Zoom (email compmath@uwaterloo.ca for Zoom link)
Speaker
Jun Liu | University of Waterloo
Title
Convergence Properties of Stochastic Gradient Methods
Abstract
The optimization of deep learning models heavily relies on stochastic gradient descent (SGD) and its variants. This talk will focus on examining the convergence properties of SGD and gradient descent methods in general. We aim to bridge the gap between the convergence analysis found in the literature, which is usually expressed in terms of expectations, and practical implementations of SGD, which require instantiations of the algorithm to converge.
We will demonstrate how the classical supermartingale convergence theorems of Robbins and Monro for stochastic approximation can be adapted to achieve almost sure convergence rates analysis for SGD and its variants, including the standard SGD, SGD with momentum (Polyak's heavyball), and Nesterov's accelerated gradient method. We will also show how Lyapunov and probabilistic arguments can be used to guarantee almost sure escape from strict saddle manifolds, bypassing the usual boundedness assumptions on gradients in prior work.
In addition, we will discuss the benefits of incorporating non-smooth gradient flows in the optimization process, leading to improved performance on benchmark neural network learning problems when compared to current gradient descent methods.