Please note: This seminar has been cancelled
Thomas
Steinke,
Postdoctoral
researcher
IBM
Almaden
Research
Center,
San
Jose,
California
As data is being more widely collected and used, privacy and statistical validity are becoming increasingly difficult to protect. Sound solutions are needed, as ad hoc approaches have resulted in several high-profile failures.
In this talk, I will illustrate how privacy can be unwittingly compromised — i.e., sensitive information can be leaked by seemingly innocuous "anonymized" or aggregate data. I will then show how differential privacy avoids these pitfalls. Differential privacy is an information-theoretic notion of algorithmic stability that provides a framework for measuring the leakage of private information and, most importantly, how this information accumulates over multiple uses of an individual's data. This allows us to design algorithms to perform sophisticated statistical analyses, while providing robust privacy guarantees.
Privacy turns out to be intimately related to generalization in machine learning. In particular, a differentially private algorithm is guaranteed to not "overfit" its data, meaning that any statistical conclusions extend to the underlying distribution from which the data was drawn. I will discuss this connection and explain how it is especially useful for adaptive data analysis, namely when one dataset is used over and over again and each successive analysis is informed by the outcome of previous analyses.
Bio: Thomas Steinke is a postdoctoral researcher at the IBM Almaden Research Center in San Jose, California. In 2016, he graduated from Harvard University with a PhD in Computer Science advised by Salil Vadhan and prior to that he completed a MSc and a BSc(Hons) in New Zealand. His research interests include providing rigorous tools for privacy-preserving data analysis and statistically valid adaptive data analysis, as well as pseudorandomness.