PhD Thesis Defence | Mengyao Zhang, On Convergence Analysis of Stochastic and Distributed Gradient-Based Algorithms | Applied Mathematics

Wednesday, April 17, 2024 1:30 pm - 1:30 pm EDT (GMT -04:00)

iCal

Location

M3 4206

Candidate

Mengyao Zhang | Applied Mathematics, University of Waterloo

Title

On Convergence Analysis of Stochastic and Distributed Gradient-Based Algorithms

Abstract

Optimization is a fundamental mathematical discipline focused on finding the best solution from a set of feasible choices. It is vital in various applications, including engineering, economics, data science, and beyond. Stochastic optimization and distributed optimization are crucial paradigms in the optimization field. Stochastic optimization deals with uncertainty and variability in problem parameters, providing a framework for decision-making under probabilistic conditions. On the other hand, distributed optimization tackles large-scale problems by harnessing the collective power of multiple agents or nodes, each with local information and local communication capabilities. This thesis aims to modify and analyze the existing stochastic methods and develop the algorithms and theory to solve the unconstrained distributed optimization problem.

For stochastic adaptive gradient-based methods, including RMSprop, Adadelta, Adam, Nadam, and AMSgrad, which are famous stochastic optimization method commonly used in machine learning and deep learning, the Chapter \ref{chapter-adam} provides a concise and rigorous proof of the almost sure convergence guarantees towards a critical point in the context of smooth and non-convex objective functions. To the best of our knowledge, this work offers the first almost sure convergence rates for these stochastic adaptive gradient-based methods. For non-convex objective functions, we show that a weighted average of the squared gradient norms in each aforementioned method achieves a unified convergence rate of $o(1/{t^{\frac{1}{2}-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$. Moreover, for strongly convex objective functions, the convergence rates for RMSprop and Adadelta can be further improved to $o(1/{t^{1-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$.

As a locking-free parallel stochastic gradient descent algorithm, Hogwild! algorithm is commonly used for training large-scale machine learning models. In Chapter \ref{chapter-hogwild}, we will provide an almost sure convergence rates analysis for Hogwild! algorithm under different assumptions on the loss function. We first prove its almost sure convergence rate on strongly convex function, which matches the optimal convergence rate of the classic stochastic gradient descent (SGD) method to an arbitrarily small error. For non-convex loss function, a weighted average of the squared gradient, as well as the last iterations of the algorithm converges to zero almost surely. We also provide a last-iterate almost sure convergence rate analysis for this method on general convex smooth functions.