Title: Why Random Reshuffling Beats Stochastic Gradient Descent
Speaker: | Julian Romero |
Affiliation: | University of Waterloo |
Room: | MC 5479 |
Abstract: Over the first few lectures in the seminar we studied the Stochastic Gradient Descent (SGD) method to minimize functions of the form $f=\sum_{i=1}^m f_i$ for $m$ large. In this talk I will go over a variant of (SGD) called Random Reshuffling (RR). In this method, the descent directions are chosen from the component functions $f_i$ at random as in (SGD), but the choice is made \textit{without replacement} and in a cyclic fashion. In the past, numerical computations have shown evidence of (RR) outperforming (SGD), however a proof of this fact was missing until very recently. I will go over the basic ideas surrounding the proof for the case in which the functions $f_i$ are quadratics. Time permitting I will explain the general case when the $f_i$'s are smooth.