Seminar by Shai Ben-David | Statistics and Actuarial Science

Tuesday, January 28, 2025 10:30 am - 11:30 am EST (GMT -05:00)

Statistics and Biostatistics seminar series & Probability seminar series

Shai Ben-David
University of Waterloo

Room: M3 3127

Learning probability distributions; what can, what can't be done.

A possible high-level description of statistical learning is that it aims to learn about some unknown probability distribution (``environment”) from samples it generates (``training data”). In its most general form, assuming no prior knowledge and asking to find a reliable approximations of the data generating distributions (a.k.a. density estimation), there can be no success guarantee. In this talk I will discuss two major directions of relaxing that too hard problem.

First, I will address the situation under common prior knowledge assumption - I will briefly describe settling the question of the sample complexity of learning mixtures of Gaussians.

Secondly, I will describe recent results shwoing that there can be no combinatorial (dimesion based) characterization of the learnable families of distributions (in contrast with the known VC-dimension-like characterizations of common supervised learning tasks).

If time permits, I will also address the other basic direction; what can be learnt about unknown distributions when no prior knowledge is applied. I will describe a surprising result. Namely, the independence from set theory of a basic statistical learnability problem. As a corollary, I will show that there can be no combinatorial dimension that characterizes the families of random variables that can be reliably learnt.

Both parts of the talks use novel notions of sample compression schemes as key components.

The first part is based on joint work with Hasan Ashiani, Nick Harvey, Chris Law, Abas Merhabian and Yaniv Plan and the second part on my student Tosca lechner and the last part on work with Shay Moran, Pavel Hrubes, Amir Shpilka and Amir Yehudayoff.