Events - 2018

Thursday, December 13, 2018 9:00 AM EST

PhD Seminar • Dynamic Sampling used in TREC Core 2018

Haotian Zhang, PhD candidate
David R. Cheriton School of Computer Science

Dynamic sampling (DS) is applied to create a sampled set of relevance judgments in our participation of TREC Common Core Track 2018. One goal was to test the effectiveness and efficiency of this technique with a set of non-expert, secondary relevance assessors. We consider NIST assessors to be the experts and the primary assessors. Another goal was to make available to other researchers a sampled set of relevance judgments (prels) and thus allow the estimation of retrieval metrics that have the potential to be more robust than the standard NIST provided relevance judgments (qrels). In addition to creating the prels, we also submitted several runs based on our manual judging and the models produced by our HiCAL system.

Wednesday, December 12, 2018 12:15 PM EST

PhD Seminar • GAL: Graph-Aware Layout for Disk-Resident Graph Databases

Zeynep Korkmaz, PhD candidate
David R. Cheriton School of Computer Science

Analysis on graphs have powerful impact on solving many social and scientific problems, and applications often perform expensive traversals on large scale graphs. Caching approaches on top of persistent storage are among the classical solutions to handle high request throughput. However, graph processing applications have poor access locality, and caching algorithms do not improve disk I/O sufficiently. We present GAL, a graph-aware layout for disk-resident graph databases that generates a storage layout for large-scale graphs on disk with the objective of increasing locality of disk blocks and reducing the number of I/O operations for transactional workloads.

Monday, December 10, 2018 10:30 AM EST

DSG Seminar Series • Algorithms and Optimizations for Incremental Window-Based Aggregations

Panos K. Chrysanthis, University of Pittsburgh

Abstract: Online analytics, in most advanced scientific, business, and defense applications, rely heavily on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). ACQs continuously aggregate streaming data and periodically produce results such as max or average over a given window of the latest data. It was shown that in processing ACQs it is beneficial to use incremental evaluation, which involves storing and reusing calculations performed over the unchanged parts of the window, rather than performing the re-evaluation of the entire window after each update.

Wednesday, November 28, 2018 12:15 PM EST

PhD Seminar • Generalized Transaction Durability Model

Jaemyung Kim, PhD candidate
David R. Cheriton School of Computer Science

Transaction durability guarantees the ability to recover committed transactions from failures. However, making every transaction durable impacts transaction processing performance. Some ad-hoc durability mechanisms (e.g., delayed durability) improve performance, but they risk transactions losing their effects due to failures. The current one-size-fits-all transaction durability model does not solve this problem. We propose a new generalized transaction durability model to trade-off performance and durability and argue that transactions should also provide flexible durability like they provide multiple isolation levels. We evaluate the performance of a modified PostgreSQL that supports the new durability model using a micro-benchmark to show the durability/performance trade-offs.

Wednesday, November 21, 2018 12:15 PM EST

PhD Seminar • Distributed Dependency Discovery

Hemant Saxena, PhD candidate
David R. Cheriton School of Computer Science

We address the problem of discovering dependencies from distributed big data. Existing (non-distributed) algorithms focus on minimizing computation by pruning the search space of possible dependencies. However, distributed algorithms must also optimize data communication costs, especially in current shared-nothing settings. To do this, we define a set of primitives for dependency discovery, which corresponds to data processing steps separated by communication barriers, and we present efficient implementations that optimize both computation and communication costs. Using real data, we show that algorithms built using our primitives are significantly faster and more communication-efficient than straightforward distributed implementations.

Monday, November 19, 2018 10:30 AM EST

DSG Seminar Series • Hierarchical Dense Subgraph Discovery: Models, Algorithms, Applications

A. Erdem Sarıyüce, University at Buffalo

Abstract: Finding dense substructures in a network is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. In this talk, I will talk about a framework that we designed to find dense regions of the graph with hierarchical relations.

Wednesday, November 14, 2018 12:15 PM EST

PhD Seminar • Evaluating Subgraph Queries With a Mix of Tradition and Modernity

Amine Mhedhbi, PhD candidate
David R. Cheriton School of Computer Science

We study the problem of optimizing subgraph queries (SQs) using the new worst-case optimal (WCO) join plans in Selinger-style cost-based optimizers. WCO plans evaluate SQs by matching one query vertex at a time using multiway intersections. The core problem in optimizing WCO plans is to pick an ordering of the query vertices to match.

Wednesday, October 31, 2018 12:15 PM EDT

PhD Seminar • Energy Efficiency in Database Servers with Multi-core CPUs

Mustafa Korkmaz, PhD candidate
David R. Cheriton School of Computer Science

Data centers consume significant amounts of energy and consumption is growing each year. Alongside efforts in the hardware domain, there are some mechanisms in the software domain to reduce energy consumption. One of these mechanisms is dynamic voltage and frequency scaling (DVFS)

Wednesday, October 24, 2018 12:15 PM EDT

PhD Seminar • RDF Data Quality

Mina Farid, PhD candidate
David R. Cheriton School of Computer Science

RDF has become a prevalent format to represent disparate data that is ingested from heterogeneous sources. However, data often contains errors due to extraction, transformation, and integration problems, leading to missing or contradicting information that propagate to downstream applications.

Monday, October 15, 2018 10:30 AM EDT

DSG Seminar Series • ExpoDB: Towards a Unified OLTP and OLAP Over a Secure Platform

Speaker: Mohammad Sadoghi, UC Davis

Wednesday, September 26, 2018 12:00 PM EDT

PhD Seminar • Data Systems — Effective User Interaction for High-Recall Retrieval: Less is More

Haotian Zhang, PhD candidate
David R. Cheriton School of Computer Science

Thursday, September 6, 2018 10:30 AM EDT

DSG Seminar Series • What if we could reason about the design space of data structures?

Speaker: Stratos Idreos, Harvard University

Monday, July 30, 2018 2:00 PM EDT

DSG Seminar Series • Data Models from Traditional Databases to NoSQL Systems

Paolo Atzeni, Database Professor and Head of the Department of Engineering
Università Roma Tre

NoSQL systems have gained their popularity for many reasons, including the flexibility they provide with modeling, which tries to relax the rigidity provided by the relational model and by the other structured models.

Wednesday, July 18, 2018 12:30 PM EDT

PhD Seminar • Experimental Analysis of Streaming Algorithms for Graph Partitioning

Anil Pacaci, PhD candidate
David R. Cheriton School of Computer Science

Thursday, July 12, 2018 10:30 AM EDT

DSG Seminar Series • Managing Big Multidimensional Data – A Journey from Data Acquisition to Prescriptive Analytics

Torben Bach Pedersen, Professor of Computer Science
Aalborg University, Denmark

Data collected from new sources such as sensors and smart devices is large, fast, and often complex. There is a universal wish to perform multidimensional OLAP-style analytics on such data, i.e., to turn it into “Big Multidimensional Data”. Supporting this is a multi-stage journey, requiring new tools and systems, and forming a new, extended data cycle with models as a key concept.

Thursday, June 28, 2018 10:30 AM EDT

DSG Seminar Series • Targeted Crowdsourcing with a Billion (Potential) Users

Panos Ipeirotis, Professor and George A. Kellner Faculty Fellow
Department of Information, Operations, and Management Sciences
New York University

Wednesday, May 16, 2018 1:00 PM EDT

MMATH Thesis Presentation • Math Information Retrieval using a Text Search Engine

Dallas Fraser, Master’s candidate
David R. Cheriton School of Computer Science

Combining text and mathematics when searching in a corpus with extensive mathematical notation remains an open problem. Recent results for math information retrieval systems on the math and text retrieval task at NTCIR-12, for example, show room for improvement, even though formula retrieval appears to be fairly successful.

Friday, May 11, 2018 1:30 PM EDT

Seminar • RAMP: RDMA Migration Platform

Babar Naveed Memon, Master’s candidate
David R. Cheriton School of Computer Science

Remote Direct Memory Access (RDMA) can be used to implement a shared storage abstraction or a shared nothing abstraction for distributed applications. We argue that the shared storage abstraction is an overkill for loosely coupled applications and that the shared nothing abstraction does not leverage all the benefits of RDMA.

Thursday, May 10, 2018 2:00 PM EDT

DSG Seminar Series • Next Generation Indexes For Big Data Engineering

Daniel Lemire
Université Télug

Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions.

Thursday, April 26, 2018 10:30 AM EDT

DSG Seminar Series • Improving Understanding and Exploration of Data by Non-Database Experts

Rachel Pottinger, Department of Computer Science
University of British Columbia

Users are faced with an increasing onslaught of data, whether it's in their choices of movies to watch, assimilating data from multiple sources, or finding information relevant to their lives on open data registries.

Monday, April 23, 2018 10:30 AM EDT

DSG Seminar Series • Making Approximate Query Processing Mainstream: Progress and the Road Ahead

Barzan Mozafari, Department of Computer Science and Engineering
University of Michigan

Friday, April 20, 2018 10:30 PM EDT

DSG Seminar Series • Speedup Set Intersections in Graph Algorithms using SIMD Instructionsnotes

Lei Zou, Institute of Computer Science and Technology
Peking University

In this talk, I focus on accelerating a widely employed computing pattern — set intersection, to boost a group of relevant graph algorithms. Graph’s adjacency-lists can be naturally considered as node sets, thus set intersection is a primitive operation in many graph algorithms. We propose QFilter, a set intersection algorithm using SIMD instructions. QFilter adopts a merge-based framework and compares two blocks of elements iteratively by SIMD instructions.

Monday, March 26, 2018 3:30 PM EDT

Pages

January 2018

S	M	T	W	T	F	S
31	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31	1	2	3

Return to current month

2024 (13)
1. December (1)
2. October (3)
3. June (1)
4. May (1)
5. April (1)
6. March (5)
7. February (1)
2023 (13)
1. December (2)
2. October (1)
3. September (2)
4. August (2)
5. May (3)
6. April (1)
7. February (1)
8. January (1)
2022 (6)
2021 (8)
2020 (6)
2019 (27)
2018 (26)
1. December (3)
2. November (4)
3. October (3)
4. September (2)
5. July (3)
6. June (1)
7. May (3)
8. April (3)
9. March (3)
10. January (1)
2017 (15)
2016 (25)
2015 (19)
2014 (34)

Current students (1)
- Current graduate students (1)
Future students (1)
- Future graduate students (1)
Faculty (1)
Alumni (1)
Donors | Friends | Supporters (1)
Media (1)

DSG Seminar Series (1)

Events - 2018

PhD Seminar • Dynamic Sampling used in TREC Core 2018

PhD Seminar • GAL: Graph-Aware Layout for Disk-Resident Graph Databases

DSG Seminar Series • Algorithms and Optimizations for Incremental Window-Based Aggregations

PhD Seminar • Generalized Transaction Durability Model

PhD Seminar • Distributed Dependency Discovery

DSG Seminar Series • Hierarchical Dense Subgraph Discovery: Models, Algorithms, Applications

PhD Seminar • Evaluating Subgraph Queries With a Mix of Tradition and Modernity

PhD Seminar • Energy Efficiency in Database Servers with Multi-core CPUs

PhD Seminar • RDF Data Quality

DSG Seminar Series • ExpoDB: Towards a Unified OLTP and OLAP Over a Secure Platform

PhD Seminar • Data Systems — Effective User Interaction for High-Recall Retrieval: Less is More

DSG Seminar Series • What if we could reason about the design space of data structures?

DSG Seminar Series • Data Models from Traditional Databases to NoSQL Systems

PhD Seminar • Experimental Analysis of Streaming Algorithms for Graph Partitioning

DSG Seminar Series • Managing Big Multidimensional Data – A Journey from Data Acquisition to Prescriptive Analytics

DSG Seminar Series • Targeted Crowdsourcing with a Billion (Potential) Users

MMATH Thesis Presentation • Math Information Retrieval using a Text Search Engine

Seminar • RAMP: RDMA Migration Platform

DSG Seminar Series • Next Generation Indexes For Big Data Engineering

DSG Seminar Series • Improving Understanding and Exploration of Data by Non-Database Experts

DSG Seminar Series • Making Approximate Query Processing Mainstream: Progress and the Road Ahead

DSG Seminar Series • Speedup Set Intersections in Graph Algorithms using SIMD Instructionsnotes

Distinguished Lecture Series • Magic Moments in Research and Teaching

Seminar • Leave No Trace: Personal Data with Provable Privacy Guarantees

PhD seminar • EC-Store: A Dynamic Distributed Erasure Coded Storage System

Pages

January 2018

Events - 2018

Pages

Events by date

Events by audience

Events by type