<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>36</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Sina Baghal</style></author><author><style face="normal" font="default" size="100%">Courtney Paquette</style></author><author><style face="normal" font="default" size="100%">Stephen Vavasis</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">A termination criterion for stochastic gradient descent for binary classification</style></title></titles><dates><year><style  face="normal" font="default" size="100%">2020</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">https://arxiv.org/abs/2003.10312</style></url></web-urls></urls><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;span style=&quot;left:184.599px;top:355.992px;14.944px;sans-serif;transform:scaleX(1.0592);&quot;&gt;We propose a new, simple, and computationally inexpensive t&lt;/span&gt;&lt;span style=&quot;left:606.199px;top:355.992px;14.944px;sans-serif;transform:scaleX(1.04647);&quot;&gt;ermination test for constant step-size&lt;/span&gt;&lt;span style=&quot;left:161.599px;top:374.192px;14.944px;sans-serif;transform:scaleX(0.993215);&quot;&gt;stochastic gradient descent (SGD) applied to binary classi&lt;/span&gt;&lt;span style=&quot;left:562.799px;top:374.192px;14.944px;sans-serif;transform:scaleX(1.00203);&quot;&gt;fication on the logistic and hinge loss with &lt;/span&gt;&lt;span style=&quot;left:161.599px;top:392.392px;14.944px;sans-serif;transform:scaleX(1.01251);&quot;&gt;homogeneous linear predictors. Our theoretical results su&lt;/span&gt;&lt;span style=&quot;left:543.399px;top:392.392px;14.944px;sans-serif;transform:scaleX(1.03895);&quot;&gt;pport the effectiveness of our stopping criterion&lt;/span&gt;&lt;span style=&quot;left:161.599px;top:410.792px;14.944px;sans-serif;transform:scaleX(1.03051);&quot;&gt; when the data is Gaussian distributed. This presence of nois&lt;/span&gt;&lt;span style=&quot;left:568.599px;top:410.792px;14.944px;sans-serif;transform:scaleX(1.04244);&quot;&gt;e allows for the possibility of non-separable &lt;/span&gt;&lt;span style=&quot;left:161.599px;top:428.992px;14.944px;sans-serif;transform:scaleX(1.04835);&quot;&gt;data. We show that our test terminates in a finite number of ite&lt;/span&gt;&lt;span style=&quot;left:587.199px;top:428.992px;14.944px;sans-serif;transform:scaleX(1.04307);&quot;&gt;rations and when the noise in the data is&lt;/span&gt;&lt;span style=&quot;left:161.599px;top:447.192px;14.944px;sans-serif;transform:scaleX(1.05408);&quot;&gt;not too large, the expected classifier at termination nearly &lt;/span&gt;&lt;span style=&quot;left:554.199px;top:447.192px;14.944px;sans-serif;transform:scaleX(1.03973);&quot;&gt;minimizes the probability of misclassification.&lt;/span&gt;&lt;span style=&quot;left:161.599px;top:465.592px;14.944px;sans-serif;transform:scaleX(1.07723);&quot;&gt; Finally, numerical experiments indicate for both real and s&lt;/span&gt;&lt;span style=&quot;left:561.199px;top:465.592px;14.944px;sans-serif;transform:scaleX(1.09668);&quot;&gt;ynthetic data sets that our termination test &lt;/span&gt;&lt;span style=&quot;left:161.599px;top:483.792px;14.944px;sans-serif;transform:scaleX(1.05984);&quot;&gt;exhibits a good degree of predictability on accuracy and run&lt;/span&gt;&lt;span style=&quot;left:567.599px;top:483.792px;14.944px;sans-serif;transform:scaleX(1.07936);&quot;&gt;ning time.&lt;/span&gt;</style></abstract></record></records></xml>