PhD Seminar • Experimental Analysis of Streaming Algorithms for Graph Partitioning
Anil Pacaci, PhD candidate
David R. Cheriton School of Computer Science
Anil Pacaci, PhD candidate
David R. Cheriton School of Computer Science
Paolo Atzeni, Database Professor and Head of the Department of Engineering
Università Roma Tre
NoSQL systems have gained their popularity for many reasons, including the flexibility they provide with modeling, which tries to relax the rigidity provided by the relational model and by the other structured models.
Haotian Zhang, PhD candidate
David R. Cheriton School of Computer Science
Mina Farid, PhD candidate
David R. Cheriton School of Computer Science
RDF has become a prevalent format to represent disparate data that is ingested from heterogeneous sources. However, data often contains errors due to extraction, transformation, and integration problems, leading to missing or contradicting information that propagate to downstream applications.
Mustafa Korkmaz, PhD candidate
David R. Cheriton School of Computer Science
Amine Mhedhbi, PhD candidate
David R. Cheriton School of Computer Science
We study the problem of optimizing subgraph queries (SQs) using the new worst-case optimal (WCO) join plans in Selinger-style cost-based optimizers. WCO plans evaluate SQs by matching one query vertex at a time using multiway intersections. The core problem in optimizing WCO plans is to pick an ordering of the query vertices to match.
Hemant Saxena, PhD candidate
David R. Cheriton School of Computer Science
We address the problem of discovering dependencies from distributed big data. Existing (non-distributed) algorithms focus on minimizing computation by pruning the search space of possible dependencies. However, distributed algorithms must also optimize data communication costs, especially in current shared-nothing settings. To do this, we define a set of primitives for dependency discovery, which corresponds to data processing steps separated by communication barriers, and we present efficient implementations that optimize both computation and communication costs. Using real data, we show that algorithms built using our primitives are significantly faster and more communication-efficient than straightforward distributed implementations.
Jaemyung Kim, PhD candidate
David R. Cheriton School of Computer Science
Transaction durability guarantees the ability to recover committed transactions from failures. However, making every transaction durable impacts transaction processing performance. Some ad-hoc durability mechanisms (e.g., delayed durability) improve performance, but they risk transactions losing their effects due to failures. The current one-size-fits-all transaction durability model does not solve this problem. We propose a new generalized transaction durability model to trade-off performance and durability and argue that transactions should also provide flexible durability like they provide multiple isolation levels. We evaluate the performance of a modified PostgreSQL that supports the new durability model using a micro-benchmark to show the durability/performance trade-offs.
Panos K. Chrysanthis, University of Pittsburgh
Abstract: Online analytics, in most advanced scientific, business, and defense applications, rely heavily on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). ACQs continuously aggregate streaming data and periodically produce results such as max or average over a given window of the latest data. It was shown that in processing ACQs it is beneficial to use incremental evaluation, which involves storing and reusing calculations performed over the unchanged parts of the window, rather than performing the re-evaluation of the entire window after each update.
Zeynep Korkmaz, PhD candidate
David R. Cheriton School of Computer Science
Analysis on graphs have powerful impact on solving many social and scientific problems, and applications often perform expensive traversals on large scale graphs. Caching approaches on top of persistent storage are among the classical solutions to handle high request throughput. However, graph processing applications have poor access locality, and caching algorithms do not improve disk I/O sufficiently. We present GAL, a graph-aware layout for disk-resident graph databases that generates a storage layout for large-scale graphs on disk with the objective of increasing locality of disk blocks and reducing the number of I/O operations for transactional workloads.