Title: Review of martingale theory, stochastic gradient descent, and adaptive line-line-search for stochastic optimization.
|Affiliation:||University of Waterloo|
Abstract: With the rise of large data sets, practical algorithms for machine learning often use probability and statistics. The first half of the seminar will review some background material on probability theory, providing formal definitions of conditional expectations, martingales (super martingales), and stopping times. We will conclude by stating the optional stopping theorem. In the process, we will connect each of these probabilistic quantities with concepts from (deterministic) optimization.
In the second half of the seminar, I will introduce the stochastic gradient descent (SGD)- a simple stochastic algorithm used to solve large finite sum problems and prove its convergence. We will finish by discussing some difficulties practitioners in ML often face when implementing SGD. Lastly, time permitting, I will introduce some current research in stochastic line search. I will present the first practical line-search method for stochastic optimization, which has rigorous convergence guarantees and requires only knowable quantities for implementation.