Welcome to the Data Systems Group
The Data Systems Group at the University of Waterloo's Cheriton School of Computer Science builds innovative, high-impact platforms, systems, and applications for processing, managing, analyzing, and searching the vast collections of data that are integral to modern information societies — colloquially known as "big data" technologies.
Our capabilities span the full spectrum from unstructured text collections to relational data, and everything in between including semi-structured sources such as time series, log data, graphs, and other data types. We work at multiple layers in the software stack, ranging from storage management and execution platforms to user-facing applications and studies of user behaviour.
Our research tackles all phases of the information lifecycle, from ingest and cleaning to inference and decision support.
- Oct. 22, 2018
Technology-assisted review (TAR) — an automated process used to select and prioritize documents for review, pioneered by Research Professor Maura Grossman and Professor Gordon Cormack — was used for the first time by a state archive to classify emails from the administration of former Virginia Governor Tim Kaine for release to the public.
- Sep. 12, 2018
Recent computer science PhD graduate and postdoctoral fellow Andrew Kane, MMath graduate Dallas Fraser, and Distinguished Professor Emeritus Frank Tompa have received the best paper award at DocEng 2018, the 18th ACM Symposium on Document Engineering.
- Sep. 10, 2018
One often-heard complaint is that academics labour away in their ivory towers, divorced from happenings in the real world. A few years ago, Professor Semih Salihoglu of the Data Systems Group at the University of Waterloo's Cheriton School of Computer Science noticed exactly this for graph processing.
- Feb. 13, 2019
Chang Ge, PhD candidate
David R. Cheriton School of Computer Science
Organizations are increasingly interested in allowing external data scientists to explore their sensitive datasets. Due to the popularity of differential privacy, data owners want the data exploration to ensure provable privacy guarantees. However, current systems for differentially private query answering place an inordinate burden on the data analysts to understand differential privacy, manage their privacy budget and even implement new algorithms for noisy query answering. Moreover, current systems do not provide any guarantees to the data analyst on the quantity they care about, namely accuracy of query answers.