Speaker: Umar Farooq Minhas, Microsoft Research
Abstract: Machine
learning
is
transforming
database
systems
research.
For
example,
recent
work
on
“learned
indexes”
has
changed
the
way
we
look
at
the
decades-old
field
of
DBMS
indexing.
The
key
idea
is
that
indexes
can
be
thought
of
as
“models”
that
predict
the
position
of
a
key
in
a
dataset.
Indexes
can,
thus,
be
learned.
The
original
work
by
Kraska
et
al.
shows
that
a
learned
index
beats
a
B+Tree
by
a
factor
of
up
to
three
in
search
time
and
by
an
order
of
magnitude
in
memory
footprint,
however
it
is
limited
to
static,
read-only
workloads.
In
this
talk,
I
will
present
a
new
learned
index
called
ALEX
which
addresses
practical
issues
that
arise
when
implementing
dynamic,
updatable
learned
indexes.
ALEX
effectively
combines
the
core
insights
from
learned
indexes
with
proven
techniques
used
in
B+Tree
to
achieve
high
performance
and
low
memory
footprint.
I
will
present
the
design
and
implementation
of
ALEX
along
with
detailed
experiments
that
show
that
ALEX
not
only
beats
the
B+Tree
on
all
workloads
but
also
beats
the
original
Learned
Index
on
read-only
workloads.
We
believe,
ALEX
presents
a
key
step
towards
making
learned
indexes
practical
for
a
broader
class
of
database
workloads
with
dynamic
updates.
Bio: Umar
Farooq
Minhas
is
currently
a
Principle
Researcher
in
the
Database
Group
at
Microsoft
Research
and
specializes
in
the
systems
aspects
of
database
management
and
big
data
analytics
platforms.
His
current
research
interests
include:
exploiting
machine
learning
to
improve
database
systems,
cloud-based
database
systems,
novel
distributed
programming
frameworks,
next-gen
virtualization
(Docker
&
Kubernetes),
and
performance
benchmarking.
Umar
also
works
closely
with
product
teams
in
the
Azure
Data
Org
–
which
is
responsible
for
all
data
management
offerings
from
Microsoft.
Before
joining
Microsoft
Research,
Umar
worked
as
a
Research
Staff
Member
at
the
IBM
Almaden
Research
Center
where
he
was
co-leading
various
efforts
around
big
data
storage,
scheduling,
resource
provisioning,
next
generation
platforms,
and
IBM
Watson
services.
His
research
ideas
have
been
commercialized
in
IBM
Big
SQL,
a
SQL-on-Hadoop
platform,
and
in
IBM
General
Parallel
File
System
(GPFS),
a
highly
scalable,
distributed
file
system.
Umar
received
a
PhD
and
a
Masters
of
Mathematics
in
Computer
Science
from
the
David
R.
Cheriton
School
of
Computer
Science
at
the
University
of
Waterloo
and
a
Bachelor
of
Science
in
Computer
Science
from
the
National
University
of
Computer
and
Emerging
Sciences
(Islamabad,
Pakistan).