Contact Info
Department of Applied Mathematics
University of Waterloo
Waterloo, Ontario
Canada N2L 3G1
Phone: 519-888-4567, ext. 32700
Fax: 519-746-4319
PDF files require Adobe Acrobat Reader
M3 4206
Mengyao Zhang | Applied Mathematics, University of Waterloo
On Convergence Analysis of Stochastic and Distributed Gradient-Based Algorithms
Optimization is a fundamental mathematical discipline focused on finding the best solution from a set of feasible choices. It is vital in various applications, including engineering, economics, data science, and beyond. Stochastic optimization and distributed optimization are crucial paradigms in the optimization field. Stochastic optimization deals with uncertainty and variability in problem parameters, providing a framework for decision-making under probabilistic conditions. On the other hand, distributed optimization tackles large-scale problems by harnessing the collective power of multiple agents or nodes, each with local information and local communication capabilities. This thesis aims to modify and analyze the existing stochastic methods and develop the algorithms and theory to solve the unconstrained distributed optimization problem.
For stochastic adaptive gradient-based methods, including RMSprop, Adadelta, Adam, Nadam, and AMSgrad, which are famous stochastic optimization method commonly used in machine learning and deep learning, the Chapter 2 provides a concise and rigorous proof of the almost sure convergence guarantees towards a critical point in the context of smooth and non-convex objective functions. To the best of our knowledge, this work offers the first almost sure convergence rates for these stochastic adaptive gradient-based methods. For non-convex objective functions, we show that a weighted average of the squared gradient norms in each aforementioned method achieves a unified convergence rate of $o(1/{t^{\frac{1}{2}-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$. Moreover, for strongly convex objective functions, the convergence rates for RMSprop and Adadelta can be further improved to $o(1/{t^{1-\theta}})$ for all $\theta\in\left(0,\frac{1}{2}\right)$.
As a locking-free parallel stochastic gradient descent algorithm, Hogwild! algorithm is commonly used for training large-scale machine learning models. In Chapter 3, we will provide an almost sure convergence rates analysis for Hogwild! algorithm under different assumptions on the loss function. We first prove its almost sure convergence rate on strongly convex function, which matches the optimal convergence rate of the classic stochastic gradient descent (SGD) method to an arbitrarily small error. For non-convex loss function, a weighted average of the squared gradient, as well as the last iterations of the algorithm converges to zero almost surely. We also provide a last-iterate almost sure convergence rate analysis for this method on general convex smooth functions.
Another aspect of the research addresses the convergence rate analysis of the gradient-based distributed optimization algorithms, which have been shown to achieve computational efficiency and rapid convergence while requiring weaker assumptions. We first propose a novel gradient-based proportional-integral (PI) algorithm in Chapter 4, and prove that its convergence rate matches that of the centralized gradient descent method under the strong convexity assumption. We then relax this assumption and discuss the local linear convergence of its virtual state for strictly convex cost functions.
In Chapter 5, we propose the powered proportional-integral (PI) algorithm and prove its convergence in finite time under the assumption of strict convexity. Then, we discuss the fixed-time convergence of its virtual state for strongly convex cost functions. Finally, we demonstrate the practicality of the distributed algorithms proposed in this thesis through simulation results.
Contact Info
Department of Applied Mathematics
University of Waterloo
Waterloo, Ontario
Canada N2L 3G1
Phone: 519-888-4567, ext. 32700
Fax: 519-746-4319
PDF files require Adobe Acrobat Reader
The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations.