DSG Seminar Series • Next Generation Indexes For Big Data EngineeringExport this event to calendar

Thursday, May 10, 2018 2:00 PM EDT

Daniel Lemire
Université Télug

Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions.

We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core.

Presentation slides (PDF)


Daniel Lemire is a computer science professor at the Université du Québec (TELUQ). He has also been a research officer at the National Research Council of Canada and an entrepreneur. He has written over 70 peer-reviewed publications, including more than 40 journal articles. He has held competitive research grants for the last 15 years. He serves on the program committees of leading computer science conferences (e.g., ACM CIKM, WWW, ACM WSDM, ACM SIGIR, ACM RecSys). 

He programs in C, C++, Java, JavaScript, Python, Swift and Go. He works primarily in an open-source setting. You can find his software in Git, Apache Hive, Druid, Apache Kylin, Netflix Atlas, LinkedIn Pivot, Microsoft Visual Studio Team Services and so forth. Some of his compression software is used by Apache Arrow and Apache Impala. In 2012, he was rewarded by the Google Open Source Peer Bonus Program.

He is a long-time social media user: his blog has thousands of readers and was featured on Slashdot, Reddit and Hacker News. He was one of the first Twitter users: @lemire.

Location 
DC - William G. Davis Computer Research Centre
1304
200 University Avenue West

Waterloo, ON N2L 3G1
Canada

S M T W T F S
28
29
30
31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  1. 2024 (8)
    1. May (1)
    2. April (1)
    3. March (5)
    4. February (1)
  2. 2023 (13)
    1. December (2)
    2. October (1)
    3. September (2)
    4. August (2)
    5. May (3)
    6. April (1)
    7. February (1)
    8. January (1)
  3. 2022 (6)
  4. 2021 (8)
  5. 2020 (6)
  6. 2019 (27)
  7. 2018 (26)
  8. 2017 (15)
  9. 2016 (25)
  10. 2015 (19)
  11. 2014 (34)