Please note: This PhD seminar will take place online.
Siddhartha Sahu, PhD candidate
David. R. Cheriton School of Computer Science
Supervisor: Professor Semih Salihoğlu
Differential computation (DC) is a general technique for computation sharing for arbitrary dataflow computations, including those that contain nested recursive loops, making it especially suitable for graph analytics involving iterative computations. Differential Dataflow (DD) is the reference implementation of DC and is a general dataflow system for computation sharing that is oblivious to the underlying computation.
In this talk, we show how we can develop graph-specific optimizations for DC and DD. We develop two key optimizations that are suitable for graph analytics that use a very common dataflow subroutine called iterative frontier expansion (IFE). First, we study how the input and output state of operators are indexed in DD, and present an alternative design that is more efficient in terms of overall memory usage. We then make the observation that DD operators repeatedly perform a lot of work only to verify that the operator output is empty. We present an optimization called Fast Empty Difference Verification (FEDiV) that is able to detect empty operator output without actually running the full operator logic. We show using experimental evaluation that our optimizations are effective in reducing DD’s runtime by up to 14x.