Nonlinear conjugate gradient for smooth convex functions

Citation:

Karimi, S. , & Vavasis, S. A. . (Submitted). Nonlinear conjugate gradient for smooth convex functions. Mathematical Programming - Computation. Retrieved from https://arxiv.org/pdf/2111.11613.pdf

Abstract:

The method of nonlinear conjugate gradients (NCG) is widely used in practice for unconstrained optimization, but it satisfies weak complexity bounds at best when applied to smooth convex functions. In contrast, Nesterov’s accelerated gradient (AG) method is optimal up to constant factors for this class.
 

However, when specialized to quadratic function, conjugate gradient is optimal in a strong sense among function-gradient methods. Therefore, there is seemingly a gap in the menu of available algorithms: NCG, the optimal algorithm for quadratic functions that also exhibits good practical performance for general functions, has poor complexity bounds compared to AG.
 

We propose an NCG method called C+AG (“conjugate plus accelerated gradient”) to close this gap, that is, it is optimal for quadratic functions and still satisfiesthe best possible complexity bound for more general smooth convex functions. It takes conjugate gradient steps until insufficient progress is made, at which time it switches to accelerated gradient steps, and later retries conjugate gradient. The proposed method has the following theoretical properties: (i) It is identical to linear conjugate gradient (and hence terminates finitely) if the objective function is quadratic; (ii) Its running-time bound is O(ǫ1/2) gradient evaluations for an L-smooth convex function, where ǫ is the desired residual reduction, (iii) Its running-time bound is O(L/ℓ ln(1)) if the function is both L-smooth and -strongly convex. We also conjecture and outline a proof that a variant of the method has the property: (iv) It is n-step quadratically convergent for a function whose second derivative is smooth and invertible at the optimizer. Note that the bounds in (ii) and (iii) match AG and are the best possible, i.e., they match lower bounds up to constant factors for the classes of functions under consideration. On the other hand, (i) and (iv) match NCG.
 

Notes:

Publisher's Version