Please note: This master’s thesis presentation will be given online.
Xiyang
Feng, Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Semih Salihoglu
Factorized databases utilize factorized data representations during query processing to obtain more compact final query results and faster runtimes for queries with many-to-many joins. We revisit this technique in the context of graph database management systems (GDBMSs) whose common workloads are large joins with many-to-many relationships on graph-structured data. We first review the theory of factorized databases and classic flat intermediate tuple structure in traditional pipelined GDBMSs. We then present our tuple representation which mimics factorized representations and can be easily integrated into existing query processors. We further describe how to cache sub-query results with this factorized tuple structure through a static dependency analysis of the query. We have integrated our factorized query processor into GraphflowDB, an in-memory GDBMS. Compared to the original version of GraphflowDB, whose processor is not fully factorized, query plans in our processor can be orders of magnitude faster and produce orders of magnitude smaller result sizes.
To join this master’s thesis presentation on Zoom, please go to https://us04web.zoom.us/j/8236160184?pwd=UlRkdzhsNGN1NVQ0b3hyRFZIVElHUT09.