DSG Seminar Series • ALEX: An Adaptive Learned Index for Dynamic Workloads

Monday, November 18, 2019 10:30 am - 10:30 am EST (GMT -05:00)

Speaker: Umar Farooq Minhas, Microsoft Research

Abstract: Machine learning is transforming database systems research. For example, recent work on “learned indexes” has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as “models” that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory footprint, however it is limited to static, read-only workloads.

In this talk, I will present a new learned index called ALEX which addresses practical issues that arise when implementing dynamic, updatable learned indexes. ALEX effectively combines the core insights from learned indexes with proven techniques used in B+Tree to achieve high performance and low memory footprint. I will present the design and implementation of ALEX along with detailed experiments that show that ALEX not only beats the B+Tree on all workloads but also beats the original Learned Index on read-only workloads. We believe, ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.

Bio: Umar Farooq Minhas is currently a Principle Researcher in the Database Group at Microsoft Research and specializes in the systems aspects of database management and big data analytics platforms. His current research interests include: exploiting machine learning to improve database systems, cloud-based database systems, novel distributed programming frameworks, next-gen virtualization (Docker & Kubernetes), and performance benchmarking. Umar also works closely with product teams in the Azure Data Org – which is responsible for all data management offerings from Microsoft.

Before joining Microsoft Research, Umar worked as a Research Staff Member at the IBM Almaden Research Center where he was co-leading various efforts around big data storage, scheduling, resource provisioning, next generation platforms, and IBM Watson services. His research ideas have been commercialized in IBM Big SQL, a SQL-on-Hadoop platform, and in IBM General Parallel File System (GPFS), a highly scalable, distributed file system.

Umar received a PhD and a Masters of Mathematics in Computer Science from the David R. Cheriton School of Computer Science at the University of Waterloo and a Bachelor of Science in Computer Science from the National University of Computer and Emerging Sciences (Islamabad, Pakistan).

Talk slides

Talk video