Siddhartha
Sahu,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Many applications, such as graph OLAP, temporal analysis, and contingency analysis, require analyzing multiple snapshots or views of a large-scale graph. This paper presents the design and implementation of a new view-based graph analytics system called Graphsurge that is more general than prior application-specific view-based systems. Users program Graphsurge through: (1) a declarative graph view definition language to create arbitrary views; and (2) a Differential Dataflow-based programming API to write arbitrary analytics computations.
In many applications, such as temporal and contingency analysis, users perform the same computation across many, possibly thousands of, views. By organizing views in a structure called differential cubes and running analytics computations differentially, Graphsurge efficiently shares computation across views. Graphsurge further increases computation sharing between views by reordering the views in a differential cube. This is an NP-hard problem for which we show a constant-factor approximation algorithm from literature.
We implemented Graphsurge on top of the Timely and Differential Dataflow systems. We present experiments to demonstrate the benefits of running computations differentially across multiple views, the ability to organize views as multidimensional cubes, and our view ordering optimization.