Seminar

Haotian Zhang, PhD candidate
David R. Cheriton School of Computer Science

Wednesday, February 27, 2019 12:15 pm - 12:15 pm EST (GMT -05:00)

PhD Seminar • DimmStore: Tackling Memory Power Footprint of Database Systems

Alexey Karyakin, PhD candidate
David R. Cheriton School of Computer Science

Energy consumed by the main memory in existing database systems does not effectively scale down with lower system utilization, both in terms of actual memory usage and load conditions. At the same time, main memory represents a sizable portion of the total server energy footprint, which makes it an outlier as the rest of the system moves towards energy proportionality.

We introduce DimmStore, a prototype main-memory database system that addresses the problem of memory energy consumption.

Wednesday, February 13, 2019 12:15 pm - 12:15 pm EST (GMT -05:00)

PhD Seminar • Accuracy-Aware Differentially Private Data Exploration

Chang Ge, PhD candidate
David R. Cheriton School of Computer Science

Organizations are increasingly interested in allowing external data scientists to explore their sensitive datasets. Due to the popularity of differential privacy, data owners want the data exploration to ensure provable privacy guarantees. However, current systems for differentially private query answering place an inordinate burden on the data analysts to understand differential privacy, manage their privacy budget and even implement new algorithms for noisy query answering. Moreover, current systems do not provide any guarantees to the data analyst on the quantity they care about, namely accuracy of query answers.

Thursday, December 13, 2018 9:00 am - 9:00 am EST (GMT -05:00)

PhD Seminar • Dynamic Sampling used in TREC Core 2018

Haotian Zhang, PhD candidate
David R. Cheriton School of Computer Science

Dynamic sampling (DS) is applied to create a sampled set of relevance judgments in our participation of TREC Common Core Track 2018. One goal was to test the effectiveness and efficiency of this technique with a set of non-expert, secondary relevance assessors. We consider NIST assessors to be the experts and the primary assessors. Another goal was to make available to other researchers a sampled set of relevance judgments (prels) and thus allow the estimation of retrieval metrics that have the potential to be more robust than the standard NIST provided relevance judgments (qrels). In addition to creating the prels, we also submitted several runs based on our manual judging and the models produced by our HiCAL system.

Wednesday, December 12, 2018 12:15 pm - 12:15 pm EST (GMT -05:00)

PhD Seminar • GAL: Graph-Aware Layout for Disk-Resident Graph Databases

Zeynep Korkmaz, PhD candidate
David R. Cheriton School of Computer Science

Analysis on graphs have powerful impact on solving many social and scientific problems, and applications often perform expensive traversals on large scale graphs. Caching approaches on top of persistent storage are among the classical solutions to handle high request throughput. However, graph processing applications have poor access locality, and caching algorithms do not improve disk I/O sufficiently. We present GAL, a graph-aware layout for disk-resident graph databases that generates a storage layout for large-scale graphs on disk with the objective of increasing locality of disk blocks and reducing the number of I/O operations for transactional workloads.

Wednesday, November 28, 2018 12:15 pm - 12:15 pm EST (GMT -05:00)

PhD Seminar • Generalized Transaction Durability Model

Jaemyung Kim, PhD candidate
David R. Cheriton School of Computer Science

Transaction durability guarantees the ability to recover committed transactions from failures. However, making every transaction durable impacts transaction processing performance. Some ad-hoc durability mechanisms (e.g., delayed durability) improve performance, but they risk transactions losing their effects due to failures. The current one-size-fits-all transaction durability model does not solve this problem. We propose a new generalized transaction durability model to trade-off performance and durability and argue that transactions should also provide flexible durability like they provide multiple isolation levels. We evaluate the performance of a modified PostgreSQL that supports the new durability model using a micro-benchmark to show the durability/performance trade-offs.

Wednesday, November 21, 2018 12:15 pm - 12:15 pm EST (GMT -05:00)

PhD Seminar • Distributed Dependency Discovery

Hemant Saxena, PhD candidate
David R. Cheriton School of Computer Science

We address the problem of discovering dependencies from distributed big data. Existing (non-distributed) algorithms focus on minimizing computation by pruning the search space of possible dependencies. However, distributed algorithms must also optimize data communication costs, especially in current shared-nothing settings. To do this, we define a set of primitives for dependency discovery, which corresponds to data processing steps separated by communication barriers, and we present efficient implementations that optimize both computation and communication costs. Using real data, we show that algorithms built using our primitives are significantly faster and more communication-efficient than straightforward distributed implementations.

Wednesday, November 14, 2018 12:15 pm - 12:15 pm EST (GMT -05:00)

PhD Seminar • Evaluating Subgraph Queries With a Mix of Tradition and Modernity

Amine Mhedhbi, PhD candidate
David R. Cheriton School of Computer Science

We study the problem of optimizing subgraph queries (SQs) using the new worst-case optimal (WCO) join plans in Selinger-style cost-based optimizers. WCO plans evaluate SQs by matching one query vertex at a time using multiway intersections. The core problem in optimizing WCO plans is to pick an ordering of the query vertices to match.

Wednesday, October 24, 2018 12:15 pm - 12:15 pm EDT (GMT -04:00)

PhD Seminar • RDF Data Quality

Mina Farid, PhD candidate
David R. Cheriton School of Computer Science

RDF has become a prevalent format to represent disparate data that is ingested from heterogeneous sources. However, data often contains errors due to extraction, transformation, and integration problems, leading to missing or contradicting information that propagate to downstream applications.

Wednesday, October 31, 2018 12:15 pm - 12:15 pm EDT (GMT -04:00)

PhD Seminar • Energy Efficiency in Database Servers with Multi-core CPUs

Mustafa Korkmaz, PhD candidate
David R. Cheriton School of Computer Science

Data centers consume significant amounts of energy and consumption is growing each year. Alongside efforts in the hardware domain, there are some mechanisms in the software domain to reduce energy consumption. One of these mechanisms is dynamic voltage and frequency scaling (DVFS)