Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions.
We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core.
Presentation slides (PDF)
Daniel Lemire is a computer science professor at the Université du Québec (TELUQ). He has also been a research officer at the National Research Council of Canada and an entrepreneur. He has written over 70 peer-reviewed publications, including more than 40 journal articles. He has held competitive research grants for the last 15 years. He serves on the program committees of leading computer science conferences (e.g., ACM CIKM, WWW, ACM WSDM, ACM SIGIR, ACM RecSys).
200 University Avenue West
Waterloo, ON N2L 3G1