Monday, October 19, 2015 10:30 am
-
10:30 am
EDT (GMT -04:00)
Speaker: | Ashraf Aboulnaga, QCRI |
Abstract: |
Distributed
data
processing
platforms
such
as
MapReduce
and
Pregel
have
substantially
simplified
the
design
and
deployment
of
certain
classes
of
distributed
graph
analytics
algorithms.
However,
these
platforms
do
not
represent
a
good
match
for
distributed
graph
mining
problems,
as
for
example
finding
frequent
subgraphs
in
a
graph.
Given
an
input
graph,
these
problems
require
exploring
a
very
large
number
of
subgraphs
and
finding
patterns
that
match
some
"interestingness"
criteria
desired
by
the
user.
These
algorithms
are
very
important
for
areas
such
as
social
networks,
semantic
web,
and
bioinformatics.
In
this
talk,
I
will
present
Arabesque,
the
first
distributed
data
processing
platform
for
implementing
graph
mining
algorithms.
Arabesque
automates
the
process
of
exploring
a
very
large
number
of
subgraphs.
It
defines
a
high-level
filter-process
computational
model
that
simplifies
the
development
of
scalable
graph
mining
algorithms:
Arabesque
explores
subgraphs
and
passes
them
to
the
application,
which
must
simply
compute
outputs
and
decide
whether
the
subgraph
should
be
further
extended.
We
use
Arabesque's
API
to
produce
distributed
solutions
to
three
fundamental
graph
mining
problems:
frequent
subgraph
mining,
counting
motifs,
and
finding
cliques.
Our
implementations
require
a
handful
of
lines
of
code,
scale
to
trillions
of
subgraphs,
and
represent
in
some
cases
the
first
available
distributed
solutions.
This is joint work with Carlos Teixeira, Alexandre Fonseca, Marco Serafini, Georgos Siganos, and Mohammed Zaki. It appears in SOSP 2015. |