Please note: This seminar will be given online.
Anindya De, Department of Computer and Information Science
University of Pennsylvania
Consider the following basic problem in sparse linear regression — an algorithm gets labeled samples of the form (x, + \eps) where w is an unknown n-dimensional vector, x is drawn from a background distribution D and \eps is some independent noise. Given the promise that w is k-sparse, the breakthrough work of Candes, Rhomberg and Tao (2005) shows that w can be recovered with samples and time which scales as O(k log n). This should be contrasted with general linear regression where O(n) samples are information theoretically necessary.
In this talk, we look at this question from the vantage point of property testing and study the decision variant of the following question — namely, what is the complexity of deciding if the unknown vector w is k-sparse (or at least say 0.01 far from k-sparse in \ell_2 distance). We show that the decision version of the problem can be solved with samples which are independent of n as long as the background distribution D is i.i.d. and the components are not Gaussian. We further show that weakening any of the conditions in this result necessarily makes the complexity scale as log n (thus showing our results are tight).
Joint work with Xue Chen (Northwestern) and Rocco Servedio (Columbia).
To joint this meeting on Zoom, please go to https://zoom.us/j/91697115544.
Meeting ID: 916 9711 5544
200 University Avenue West
Waterloo, ON N2L 3G1