Data Systems Seminars (2020-2021)

The Data Systems Seminar Series provides a forum for presentation and discussion of interesting and current data systems issues. It complements our internal data systems meetings by bringing in external colleagues. The talks that are scheduled for this year are listed below.

The talks are usually held on a Monday at 10:30 am in room DC 1302. Exceptions are flagged. Due to Covid-19, until further notice all talks will be virtual over zoom.

We will try to post the presentation notes, whenever that is possible. Please click on the presentation title to access these notes.

The Database Seminar Series is supported by

Danqi Chen
Jennie Rogers
Peter Boncz
Aaron Elmore
Scott Meyer
Speaker 6
Speaker 7
Xiaokui Xiao

14 September 2020; 10:30AM

Title: Recent Advances in Open-domain Question Answeringvideo
Speaker: Danqi Chen, Princeton University
Abstract: Open-domain question answering, the task of automatically answering questions posed by humans in a natural language, usually based on a large collection of unstructured documents, has (re-)gained a lot of popularity in the last couple of years. In this talk, I will discuss many recent exciting developments which have greatly advanced the field, including several works of ours. In particular, I would like to discuss the importance of pre-training for question answering, learning dense representations for retrieval in place of sparse models, the role of structured knowledge, as well as the trade-off between open-book and closed-book models. I will conclude with current limitations and future directions.
Bio: Danqi Chen is an Assistant Professor of Computer Science at Princeton University and co-leads the Princeton NLP Group. Her research focuses on deep learning for natural language processing, especially in the intersection of text understanding and knowledge representation & reasoning and applications in question answering, information extraction, and conversational systems. Before joining Princeton, Danqi worked as a visiting scientist at Facebook AI Research in Seattle. She received her Ph.D. from Stanford University (2018) and B.E. from Tsinghua University (2012), both in Computer Science. In the past, she was a recipient of the 2019 Arthur Samuel Best Doctoral Thesis Award at Stanford University, a Facebook Fellowship, a Microsoft Research Women’s Fellowship, and paper awards at ACL’16 and EMNLP’17.

19 October 2020; 10:30AM

Title: Privacy-preserving querying for data federations video
Speaker: Jennie Rogers, Northwestern University
Abstract: We live in a golden age of data abundance. In numerous domains - including healthcare, education research, sociology, and finance - it is standard practice for data owners to keep their records in private silos to which only a few trusted users have access. As a result, data about individuals or entities are routinely fractured over two or more silos. Researchers and analysts in these domains wish to learn aggregates over fractured data, but cannot do so owing to privacy concerns or regulatory requirements. A private data federation is a data sharing platform with which an analyst queries the union of the records of multiple silos using cryptographic protocols such that no information is revealed except that which can be deduced from its query answers. These answers are optionally noised with differential privacy to withhold information about individuals in the dataset. The data owners evaluate a private data federation query amongst themselves using secure multi-party computation. This security comes at a high performance cost, and evaluating queries naïvely with this approach is orders of magnitude slower than running the same workload insecurely. To offer efficient query evaluation and provable privacy guarantees, my team and I generalized principles of query optimization to this setting. We also created novel techniques to bring approximate query processing to this platform to both speed up querying and to contribute noise to its privacy-preserving query answers. I will close with a discussion of our pilot study deploying this technology in Chicago-area hospitals for clinical research.
Bio: Jennie Rogers is an assistant professor of Computer Science at Northwestern University. Her research is motivated by empowering people with data. More specifically, she investigates pragmatic privacy-preserving data analytics, federating databases over multiple data models, and new approaches with which individuals can explore and understand their data. She received the NSF CAREER Award in 2019 and the Northwestern Computer Science Faculty Service Award in 2020.

16 November 2020; 10:30AM 

Title: FSST: fast random-access string compression video
Speaker: Peter Boncz, CWI
Abstract: Strings are prevalent in real-world data sets. They often occupy a large fraction of the data and are slow to process. In this work, we present Fast Static Symbol Table (FSST), a lightweight compression scheme for strings. On text data, FSST offers decompression and compression speed similar to or better than the best speed-optimized compression methods, such as LZ4, yet offers significantly better compression factors. Moreover, its use of a static symbol table allows random access to individual, compressed strings, enabling lazy decompression and query processing on compressed data. We believe these features will make FSST a valuable piece in the standard compression toolbox.
Bio: Peter Boncz holds appointments as tenured researcher at CWI and professor at VU University Amsterdam. His academic background is in core database architecture, with the MonetDB the systems outcome of his PhD -- MonetDB much later won the 2016 ACM SIGMOD systems award. He has a track record in bridging the gap between academia and commercial application, receiving the Dutch ICT Regie Award 2006 for his role in the CWI spin-off company Data Distilleries. In 2008 he co-founded Vectorwise around the analytical  database system by the same name which pioneered vectorized query execution -- later acquired by Actian. He is co-recipient of the 2009 VLDB 10 Years Best Paper Award, and in 2013 received the Humboldt Research Award for his research on database architecture. He also works on graph data management, founding in 2013 the Linked Database Benchmark Council (LDBC), a benchmarking organization for graph database systems.

14 December 2020; 10:30AM


CrocodileDB: Resource Efficient Database Execution video

Speaker: Aaron Elmore, University of Chicago
Abstract: The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point in time can increase result reuse, reduce work that might later be invalidated, or avoid unnecessary work altogether. In this talk I will introduce CrocodileDB, a resource-efficient database system that automatically optimizes deferment based on user-specification and workload prediction. CrocodileDB integrates new ways of specifying timing information, new query execution policies, new task schedulers, and new data loading schemes. In particular, this talk will highlight two new query execution paradigms, Intermittent Query Processing and Incremental-Aware Query Execution.
Bio: Aaron J. Elmore is an Assistant Professor in the Department of Computer Science, and the College of the University of Chicago. Aaron was previously a Postdoctoral Associate at MIT. Aaron's thesis on Elasticity Primitives for Database-as-a-Service was completed at the University of California, Santa Barbara. His recent research interests focus on building data systems that address the growing data deluge. He is currently an associate editor for SIGMOD record, and has served as co-chair for SIGMOD demonstration track, the inaugural SIGMOD student research competition, and VLDB proceeding editor.

11 January 2021; 3:00PM (Please note the unusual time)


LIquid: Soul of a New Graph Database video

Speaker: Scott Meyer, LinkedIn
Abstract: The talk will present LIquid, the graph database which serves what we call the “on-line economic graph.”  LIquid is in full production, serving 1M QPS against a ~10Tb corpus. The first half of the talk defines what a graph database is, ultimately a relational system which satisfies certain constraints. The second half explains how one might go about building such a graph database.
Bio: Scott has had a long and varied career in software development, including graphics, networking, GUI applications, OODBs, a JVM implementation, 3 years spent sailing around the Pacific, and most recently, 15 years working on graph databases at Metaweb, Google, and LinkedIn.

19 April 2021; 10:30AM

Title: notesvideo

17 May 2021; 10:30AM

Title: notesvideo

14 June 2021; 9:30AM (Note the different time)

Title: TBD video
Speaker: Xiaokui Xiao, National University of Singapore
Abstract: TBD
Bio: TBD