<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>27</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Tao Jiang</style></author><author><style face="normal" font="default" size="100%">Samuel Tang</style></author><author><style face="normal" font="default" size="100%">Stephen Vavasis</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Re-embedding data to strengthen recovery guarantees of clustering</style></title></titles><dates><year><style  face="normal" font="default" size="100%">2023</style></year></dates><urls><web-urls><url><style face="normal" font="default" size="100%">https://arxiv.org/abs/2301.10901</style></url></web-urls></urls><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">We propose a clustering method that involves chaining four known techniques into a pipeline yielding an algorithm with stronger recovery guarantees than any of the four components separately. Given n points in R d , the first component of our pipeline, which we call leapfrog distances, is reminiscent of density-based clustering, yielding an n × n distance matrix. The leapfrog distances are then translated to new embeddings using multidimensional scaling and spectral methods, two other known techniques, yielding new embeddings of the n points in R^d' , where d' satisfies d' &amp;lt;&amp;lt; d in general. Finally, sum-of-norms (SON) clustering is applied to the re-embedded points. Although the fourth step (SON clustering) can in principle be replaced by any other clustering method, our focus is on provable guarantees of recovery of underlying structure. Therefore, we establish that the re-embedding improves recovery SON clustering, since SON clustering is a well-studied method that already has provable guarantees.</style></abstract></record></records></xml>