Wednesday, July 31, 2019 12:15 pm
-
12:15 pm
EDT (GMT -04:00)
PLEASE NOTE: THIS TALK IS CANCELLED
Siddhartha
Sahu,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Developers
of
a
variety
of
applications,
such
as
fraud
and
threat
detection,
risk
assessment,
and
recommendations,
naturally
model
the
connected
data
as
graphs.
Many
of
these
applications
require
the
ability
to
analyze
different
snapshots
or
views
of
a
large-scale
graph.
Broadly,
the
views
these
applications
require
can
be
divided
into
two
categories:
(i)
aggregate
views
that
aggregate
multiple
nodes
and
edges
into
aggregate
nodes
and
edges
that
are
informally
called
supernodes
and
superedges;
and
(ii)
filtered
views
that
filter
a
subset
of
the
edges
and
nodes
in
the
original
graph
based
on
some
predicates.
Many
of
these
applications
construct
many,
sometimes
thousands
of,
views
of
the
same
graph,
and
compute
the
same
computation
across
each
view.
As
such,
there
is
an
opportunity
for
sharing
of
computation
in
both
during
the
construction
of
these
views
and
when
the
same
analytics
computation
is
executed
over
a
set
of
views.
Our first contribution is the design and implementation of a new open-source system we call GraphSurge, which treats graph views as first-class citizens and supports a wider range of view-based applications than prior work. GraphSurge's view definition language (GVDL), supports two organizational structures: (i) aggregate cubes and (ii) multidimensional differential cubes, which is the second contribution of our work and the focus of this paper. Our aggregate and differential cubes enable computation sharing both when constructing multiple views and when sharing computation across views.
We have developed GraphSurge on top of the Timely Dataflow system and its Differential Dataflow layer. Differential Dataflow is a system for general dataflow processing based on the differential computation model, which allows sharing of computation for evolving datasets. Users write analytics computations in GraphSurge as Differential Dataflow programs. When running the same computation on views that are organized as differential cubes, GraphSurge represents the views as partially ordered versions of a single dataset, and uses Differential Dataflow's differential computation capability to efficiently share computation.
Our first contribution is the design and implementation of a new open-source system we call GraphSurge, which treats graph views as first-class citizens and supports a wider range of view-based applications than prior work. GraphSurge's view definition language (GVDL), supports two organizational structures: (i) aggregate cubes and (ii) multidimensional differential cubes, which is the second contribution of our work and the focus of this paper. Our aggregate and differential cubes enable computation sharing both when constructing multiple views and when sharing computation across views.
We have developed GraphSurge on top of the Timely Dataflow system and its Differential Dataflow layer. Differential Dataflow is a system for general dataflow processing based on the differential computation model, which allows sharing of computation for evolving datasets. Users write analytics computations in GraphSurge as Differential Dataflow programs. When running the same computation on views that are organized as differential cubes, GraphSurge represents the views as partially ordered versions of a single dataset, and uses Differential Dataflow's differential computation capability to efficiently share computation.