PhD Seminar • Algorithms and Complexity • Testing Support Size More Efficiently Than Learning Histograms | Cheriton School of Computer Science

Friday, March 14, 2025 12:00 pm - 1:00 pm EDT (GMT -04:00)

iCal

Please note: This PhD seminar will take place in DC 1304 and online.

Renato Ferreira Pinto Jr., PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Eric Blais

Consider two problems about an unknown probability distribution $p$:

How many samples from $p$ are required to test if $p$ is supported on $n$ elements or not? Specifically, given samples from $p$, determine whether it is supported on at most $n$ elements, or it is “$\epsilon$-far” (in total variation distance) from being supported on $n$ elements.
Given $m$ samples from $p$, what is the largest lower bound on its support size that we can produce? The best known upper bound for problem (1) uses a general algorithm for learning the histogram of the distribution $p$, which requires $\Theta(\tfrac{n}{\epsilon^2 \log n})$ samples. We show that testing can be done more efficiently than learning the histogram, using only $O(\tfrac{n}{\epsilon \log n} \log(1/\epsilon))$ samples, nearly matching the best known lower bound of $\Omega(\tfrac{n}{\epsilon \log n})$. This algorithm also provides a better solution to problem (2), producing larger lower bounds on support size than what follows from previous work. The algorithm builds upon the Chebyshev polynomial technique of Wu and Yang (Annals of Statistics 2019), and the proof relies on an analysis of Chebyshev polynomial approximations outside the range where they are designed to be good approximations.

Joint work with Nathan Harms (EPFL), to appear at STOC 2025.

To attend this PhD seminar in person, please go to DC 1304. You can also attend virtually on Zoom.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
Hybrid: DC 1304 | Online PhD seminar
Waterloo, ON, CA N2L 3G1

Location coordinates: