<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Sahar Karimi</style></author><author><style face="normal" font="default" size="100%">Stephen A. Vavasis</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Nonlinear conjugate gradient for smooth convex functions</style></title><secondary-title><style face="normal" font="default" size="100%">Mathematical Programming - Computation</style></secondary-title></titles><dates><year><style  face="normal" font="default" size="100%">Accepted</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">https://arxiv.org/pdf/2111.11613.pdf</style></url></web-urls></urls><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;
	&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:196.001px;top:508.207px;18.1818px;sans-serif;transform:scaleX(0.969669);&quot;&gt;The method of nonlinear conjugate gradients (NCG) is widely used in practice for &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:530.807px;18.1818px;sans-serif;transform:scaleX(1.01648);&quot;&gt;unconstrained optimization, but it satisfies weak complexity bounds at best when &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:553.407px;18.1818px;sans-serif;transform:scaleX(1.01581);&quot;&gt;applied to smooth convex functions.&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:462.438px;top:553.407px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:475.401px;top:553.407px;18.1818px;sans-serif;transform:scaleX(1.00372);&quot;&gt;In contrast, Nesterov’s accelerated gradient &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:576.007px;18.1818px;sans-serif;transform:scaleX(1.0179);&quot;&gt;(AG) method is optimal up to constant factors for this class.&lt;/span&gt;&lt;br role=&quot;presentation&quot;&gt;&amp;nbsp;
&lt;/p&gt;

&lt;p&gt;
	&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:196.001px;top:598.607px;18.1818px;sans-serif;transform:scaleX(1.00215);&quot;&gt;However, when specialized to quadratic function, conjugate gradient is optimal &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:621.207px;18.1818px;sans-serif;transform:scaleX(0.975326);&quot;&gt;in a strong sense among function-gradient methods. Therefore, there is seemingly a &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:643.808px;18.1818px;sans-serif;transform:scaleX(1.01008);&quot;&gt;gap in the menu of available algorithms: NCG, the optimal algorithm for quadratic &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:666.407px;18.1818px;sans-serif;transform:scaleX(1.01626);&quot;&gt;functions that also exhibits good practical performance for general functions, has &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:688.808px;18.1818px;sans-serif;transform:scaleX(1.00265);&quot;&gt;poor complexity bounds compared to AG.&lt;/span&gt;&lt;br role=&quot;presentation&quot;&gt;&amp;nbsp;
&lt;/p&gt;

&lt;p&gt;
	&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:196.001px;top:711.407px;18.1818px;sans-serif;transform:scaleX(1.00708);&quot;&gt;We propose an NCG method called C+AG (“conjugate plus accelerated gradi&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:734.008px;18.1818px;sans-serif;transform:scaleX(1.00552);&quot;&gt;ent”) to close this gap, that is, it is optimal for quadratic functions and still satisfies&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:756.607px;18.1818px;sans-serif;transform:scaleX(1.00344);&quot;&gt;the best possible complexity bound for more general smooth convex functions.&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:802.637px;top:756.607px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:813.601px;top:756.607px;18.1818px;sans-serif;transform:scaleX(1.35194);&quot;&gt;It &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:779.208px;18.1818px;sans-serif;transform:scaleX(1.01228);&quot;&gt;takes conjugate gradient steps until insufficient progress is made, at which time it &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:801.807px;18.1818px;sans-serif;transform:scaleX(1.01432);&quot;&gt;switches to accelerated gradient steps, and later retries conjugate gradient.&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:781.837px;top:801.807px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:796px;top:801.807px;18.1818px;sans-serif;transform:scaleX(1.00445);&quot;&gt;The &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:824.408px;18.1818px;sans-serif;transform:scaleX(1.03257);&quot;&gt;proposed method has the following theoretical properties: (i) It is identical to lin&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:847.008px;18.1818px;sans-serif;transform:scaleX(1.03235);&quot;&gt;ear conjugate gradient (and hence terminates finitely) if the objective function is &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.801px;top:869.608px;18.1818px;sans-serif;transform:scaleX(1.06775);&quot;&gt;quadratic; (ii) Its running-time bound is&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:498.965px;top:869.608px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:506.601px;top:869.608px;18.1818px;sans-serif;&quot;&gt;O&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:521px;top:869.608px;18.1818px;sans-serif;&quot;&gt;(&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:528px;top:869.608px;18.1818px;sans-serif;&quot;&gt;ǫ&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:535.399px;top:866.724px;13.2835px;sans-serif;&quot;&gt;−&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:546.399px;top:866.724px;13.2835px;sans-serif;&quot;&gt;1&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:553.399px;top:866.724px;13.2835px;sans-serif;&quot;&gt;/&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:560.399px;top:866.724px;13.2835px;sans-serif;&quot;&gt;2&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:568.399px;top:869.608px;18.1818px;sans-serif;transform:scaleX(1.03146);&quot;&gt;) gradient evaluations for an&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:800.891px;top:869.608px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:808.799px;top:869.608px;18.1818px;sans-serif;&quot;&gt;L&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:821.199px;top:869.608px;18.1818px;sans-serif;&quot;&gt;-&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.799px;top:892.208px;18.1818px;sans-serif;transform:scaleX(0.981741);&quot;&gt;smooth convex function, where&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:414.872px;top:892.208px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:420.599px;top:892.208px;18.1818px;sans-serif;&quot;&gt;ǫ&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:427.963px;top:892.208px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:433.799px;top:892.208px;18.1818px;sans-serif;transform:scaleX(1.0172);&quot;&gt;is the desired residual reduction, (iii) Its running-&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.799px;top:914.808px;18.1818px;sans-serif;transform:scaleX(1.05746);&quot;&gt;time bound is&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:283.163px;top:914.808px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:291.8px;top:914.808px;18.1818px;sans-serif;&quot;&gt;O&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:306.199px;top:914.808px;18.1818px;sans-serif;&quot;&gt;(&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:313.199px;top:899.208px;18.1818px;sans-serif;&quot;&gt;√&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:331.4px;top:914.807px;18.1818px;sans-serif;transform:scaleX(1.37594);&quot;&gt;L/ℓ&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:360.364px;top:914.807px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:363.4px;top:914.807px;18.1818px;sans-serif;transform:scaleX(1.03214);&quot;&gt;ln(1&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:394.6px;top:914.807px;18.1818px;sans-serif;transform:scaleX(1.07892);&quot;&gt;/ǫ&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:411px;top:914.807px;18.1818px;sans-serif;transform:scaleX(1.12492);&quot;&gt;)) if the function is both&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:617.891px;top:914.807px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:626.401px;top:914.807px;18.1818px;sans-serif;&quot;&gt;L&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:638.801px;top:914.807px;18.1818px;sans-serif;transform:scaleX(1.00997);&quot;&gt;-smooth and&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:740.892px;top:914.807px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:749.201px;top:914.807px;18.1818px;sans-serif;&quot;&gt;ℓ&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:756.802px;top:914.807px;18.1818px;sans-serif;transform:scaleX(1.00691);&quot;&gt;-strongly &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.802px;top:937.207px;18.1818px;sans-serif;transform:scaleX(0.929556);&quot;&gt;convex.&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:227.038px;top:937.207px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:238.602px;top:937.207px;18.1818px;sans-serif;transform:scaleX(1.03115);&quot;&gt;We also conjecture and outline a proof that a variant of the method has &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.802px;top:959.806px;18.1818px;sans-serif;transform:scaleX(1.07543);&quot;&gt;the property: (iv) It is&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:346.965px;top:959.806px;18.1818px;sans-serif;&quot;&gt; &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:352.802px;top:959.806px;18.1818px;sans-serif;&quot;&gt;n&lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:363.802px;top:959.806px;18.1818px;sans-serif;transform:scaleX(0.977373);&quot;&gt;-step quadratically convergent for a function whose second &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.802px;top:982.407px;18.1818px;sans-serif;transform:scaleX(1.05225);&quot;&gt;derivative is smooth and invertible at the optimizer. Note that the bounds in (ii) &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.802px;top:1005.01px;18.1818px;sans-serif;transform:scaleX(1.01929);&quot;&gt;and (iii) match AG and are the best possible, i.e., they match lower bounds up to &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.802px;top:1027.61px;18.1818px;sans-serif;transform:scaleX(0.975541);&quot;&gt;constant factors for the classes of functions under consideration. On the other hand, &lt;/span&gt;&lt;span dir=&quot;ltr&quot; role=&quot;presentation&quot; style=&quot;left:168.802px;top:1050.21px;18.1818px;sans-serif;transform:scaleX(1.05112);&quot;&gt;(i) and (iv) match NCG.&lt;/span&gt;&lt;br role=&quot;presentation&quot;&gt;&amp;nbsp;
&lt;/p&gt;
</style></abstract></record></records></xml>