Please note: This PhD defence will take place in DC 2310 and online.
Anil Pacaci, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: University Professor M. Tamer Özsu
It is natural to model and represent interaction data as graphs in a broad range of domains such as online social networks, protein interaction data, and e-commerce applications. A number of emerging applications require continuous processing and querying of interaction data that evolves at a high rate, in near real-time, which can be modelled as a streaming graph. Persistent queries, where queries are registered into the system and new results are generated incrementally as the graph edges arrive, facilitate online analysis and real-time monitoring over streaming data. Processing persistent queries over streaming graphs combines two seemingly different but challenging problems: graph querying and streaming processing. Existing systems fail to support these workloads due to (i) the complexity of graph queries that feature recursive path navigations, subgraph patterns, and path manipulation, and (ii) the unboundedness and growth rate of streaming graphs that make it infeasible to employ batch algorithms. Consequently, a growing number of applications rely on specialized solutions tailored to specific application needs. This thesis introduces foundational techniques for efficient processing of persistent queries over streaming graphs to support this emerging class of applications in a principled manner.
The main contribution of this thesis is the design and development of a general-purpose streaming graph query processing framework. The novel challenges of persistent queries over streaming graphs dictate rethinking the components of the well-established query processor architecture, and this thesis introduces the models and algorithms to address these challenges uniformly. The central notion of Streaming Graph Query precisely characterizes the semantics of persistent queries over streaming graphs, making it possible to reason about the expressiveness and the complexity of queries targeted by the aforementioned applications. Streaming Graph Algebra, defined as a closure of a set of operators over streaming graphs, provides the primitive building blocks for evaluating and optimizing streaming graph queries. Efficient, incremental algorithms as the physical implementations of streaming graph algebra operators are provided, enabling streaming graph queries to be evaluated in a data-driven fashion. It is shown that the proposed algebra constitutes the foundational tool for the cost-based optimization of streaming graph queries by providing an algebraic basis for query evaluation. Overall, this thesis provides principled solutions to fundamental challenges for efficient querying of streaming graphs and describes the design and implementation of a general-purpose streaming graph query processing framework.