DSG Seminar Series • Privacy-preserving querying for data federations

Monday, October 19, 2020 10:30 am - 10:30 am EDT (GMT -04:00)

Speaker: Jennie Rogers, Princeton University

Abstract: We live in a golden age of data abundance. In numerous domains - including healthcare, education research, sociology, and finance - it is standard practice for data owners to keep their records in private silos to which only a few trusted users have access. As a result, data about individuals or entities are routinely fractured over two or more silos. Researchers and analysts in these domains wish to learn aggregates over fractured data, but cannot do so owing to privacy concerns or regulatory requirements. A private data federation is a data sharing platform with which an analyst queries the union of the records of multiple silos using cryptographic protocols such that no information is revealed except that which can be deduced from its query answers. These answers are optionally noised with differential privacy to withhold information about individuals in the dataset. The data owners evaluate a private data federation query amongst themselves using secure multi-party computation. This security comes at a high performance cost, and evaluating queries naïvely with this approach is orders of magnitude slower than running the same workload insecurely. To offer efficient query evaluation and provable privacy guarantees, my team and I generalized principles of query optimization to this setting. We also created novel techniques to bring approximate query processing to this platform to both speed up querying and to contribute noise to its privacy-preserving query answers. I will close with a discussion of our pilot study deploying this technology in Chicago-area hospitals for clinical research.

Bio: Jennie Rogers is an assistant professor of Computer Science at Northwestern University. Her research is motivated by empowering people with data. More specifically, she investigates pragmatic privacy-preserving data analytics, federating databases over multiple data models, and new approaches with which individuals can explore and understand their data. She received the NSF CAREER Award in 2019 and the Northwestern Computer Science Faculty Service Award in 2020.

Talk video