The Data Systems Seminar Series provides a forum for presentation and discussion of interesting and current database issues. It complements our internal database meetings by bringing in external colleagues. The talks that are scheduled for this year are listed below.
The talks are usually held on a Monday at 10:30 am in room DC 1302. Exceptions are flagged.
We will try to post the presentation notes, whenever that is possible. Please click on the presentation title to access these notes.
The Database Seminar Series is supported by
|Panos K. Chrysanthis|
|Title:||What we could reason about the design space of data structures?|
|Speaker:||Stratos Idreos, Harvard University|
Data structures are critical in any data-driven scenario, and they define the behavior of modern data systems. However, they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. In this talk, we ask two questions: What if we knew how many and which data structures are possible to design? What if we could compute the expected performance of a data structure design on a given workload and hardware without having to implement it and without even having access to the target machine? We will discuss our quest for 1) the first principles of data structures, 2) design continuums that make it possible to automate design, and 3) self-designing systems that can morph between what we now consider fundamentally different structures. We will draw examples from the NoSQL key-value store design space and discuss how to accelerate them and balance space-time tradeoffs.
|Bio:||Stratos Idreos is an assistant professor of Computer Science at Harvard University where he leads DASlab, the Data Systems Laboratory@Harvard SEAS. Stratos works on data system architectures with emphasis on how we can make it easy to design efficient data systems as applications and hardware keep evolving and on how we can make it easy to use these systems even for non-experts. For his doctoral work on Database Cracking, Stratos won the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award. He is also a recipient of an IBM zEnterpise System Recognition Award, a VLDB Challenges and Visions best paper award and an NSF Career award. In 2015 he was awarded the IEEE TCDE Rising Star Award from the IEEE Technical Committee on Data Engineering for his work on
adaptive data systems.
|Title:||ExpoDB: Towards a Unified OLTP and OLAP Over a Secure Platform|
|Speaker:||Mohammad Sadoghi, UC Davis|
|Abstract:||Arguably data is a new natural resource in the enterprise world with an unprecedented degree of proliferation and heterogeneity. However, to derive real-time actionable insights from the data, it is important to bridge the gap between analyzing a large volume of data (i.e., OLAP) and managing the
data that is being updated at a high velocity (i.e., OLTP). Historically, there has been a divide where specialized engines were developed to support either OLAP or OLTP workloads but not both; thus, limiting the analysis to stale and possibly irrelevant data.
In this talk, we present our proposed architecture to combine the real-time processing of analytical and transactional workloads within a single unified engine. To support querying and retaining the current and historic data, we design a novel efficient index maintenance techniques paving the way to a novel optimistic concurrency control. From the concurrency perspective, we further pose a question: is it possible to have concurrent execution over shared data without having any concurrency control? To answer this question, we investigate a deterministic approach to transaction processing geared towards many-core hardware by proposing a novel queue-oriented, control-free concurrency architecture (QueCC) that exhibits minimal coordination during execution while offering serializable guarantees. From the storage perspective, we develop an update-friendly lineage-based storage architecture (LSA) that offers a contention-free and lazy staging of columnar data from a write-optimized form (OLTP) into a read-optimized form (OLAP) in a transactionally consistent approach. Finally, we share our vision to move from a centralized platform onto a secure democratic and decentralized computational model.
Mohammad Sadoghi is an Assistant Professor of Computer Science at the University of California, Davis. Formerly, he was an Assistant Professor at Purdue University and Research Staff Member at IBM T.J. Watson Research Center. He received his Ph.D. from the Computer Science Department at the University of Toronto in 2013. His research spans all facets of secure and massive-scale data management. At UC Davis, he leads the ExpoLab research group with the aim to pioneer a new exploratory data platform—referred to as ExpoDB—a distributed ledger that unifies secure transactional and real-time analytical processing, all centered around a democratic and decentralized computational model. Prof. Sadoghi has over 60 publications and has filed 34 U.S. patents. His SIGMOD'11 paper was awarded the EPTS Innovative Principles Award, his EDBT'11 paper was selected as one of the best EDBT papers in 2011, and his ESWC'16 paper won the Best In-Use Paper Award. He is serving as Workshop/Tutorial Co-Chair at Middleware'18, has served as the PC Chair (Industry Track) at ACM DEBS'17, co-chaired a new workshop series, entitled Active, at both ICDE and Middleware, and co-chaired the Doctoral Symposium at Middleware'17. He served as the Area Editor for Transaction Processing in the Encyclopedia of Big Data Technologies by Springer. He is co-authoring a book on "Transaction Processing on Modern Hardware" as part of Morgan & Claypool Synthesis Lectures on Data Management. He regularly serves on the program committee of SIGMOD, VLDB, ICDE, EDBT, Middleware, ICDCS, DEBS, and ICSOC.
|Speaker:||A. Erdem Sarıyüce, University at Buffalo|
|Abstract:||Finding dense substructures in a network is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. In this talk, I will talk about a framework that we designed to find dense regions of the graph with hierarchical relations. Our model can summarize the graph as a tree of subgraphs. With the right parameters, our framework generalizes two widely accepted dense subgraph models; k-core and k-truss decompositions. We present practical sequential and parallel local algorithms for our framework and empirically evaluate their behavior in a variety of real graphs. Furthermore, we adapt our framework for bipartite graphs which are used to model group relationships such as author-paper, word-document, and user-product data. We demonstrate how proposed algorithms can be utilized for the analysis of a citation network among physics papers and user-product network of the Amazon Kindle books.|
|Bio:||A. Erdem Sariyuce is an Assistant Professor in Computer Science and Engineering at the University at Buffalo. Prior to that, he was the John von Neumann postdoctoral fellow at Sandia National Laboratories. Erdem received his Ph.D. in Computer Science from the Ohio State University. He conducts research on large-scale graph mining. In particular, he develops practical algorithms to explore and process the real-world networks. He received Best Paper Runner-up Award at the International World Wide Web Conference (WWW) in 2015. More details can be found at http://sariyuce.com.|
|Title:||Algorithms and Optimizations for Incremental Window-Based Aggregations|
|Speaker:||Panos K. Chrysanthis, University of Pittsburgh|
Online analytics, in most advanced scientific, business, and defense applications, rely heavily on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). ACQs continuously aggregate streaming data and periodically produce results such as max or average over a given window of the latest data. It was shown that in processing ACQs it is beneficial to use incremental evaluation, which involves storing and reusing calculations performed over the unchanged parts of the window, rather than performing the re-evaluation of the entire window after each update. In this talk, we examine how the principle of sharing is applied in the partial and final aggregation techniques and present our SlickDeque and WeaveShare techniques that optimize the execution of multi-ACQs in single and multiple computing nodes.
|Bio:||Panos K. Chrysanthis is a Professor of Computer Science and the founding director of the Advanced Data Management Technologies Laboratory in the School of Computing and Information at the University of Pittsburgh. He is also an Adjunct Professor at the Carnegie Mellon University and University of Cyprus. His research interests lie at the intersection of data management, distributed systems and collaborative applications. He is a recipient of the NSF CAREER Award and he is an ACM Distinguished Scientist and a Senior Member of IEEE. He is also a recipient of the University of Pittsburgh Provost Award for Excellence in Mentoring (doctoral students). He is currently the Special Issues Coordinator for the Distributed and Parallel Databases Journal and a Program Committee Co-chair of IEEE ICDE 2018. He earned his BS degree from the University of Athens, Greece and his MS and PhD degrees from the University of Massachusetts at Amherst.|
Adaptive Scalable Analytics in Multi-Engine Environments
|Speaker:||Verena Kantere, University of Ottawa|
Big Data analytics in science and industry are performed on a range of heterogeneous data stores, both traditional and modern, and on a diversity of query engines. Workflows are difficult to design and implement since they span a variety of systems. To reduce development time and processing costs, some automation is needed. In this talk we will present a new platform to manage analytics workflows. The platform enables workflow design, execution, analysis and optimization with respect to time efficiency, over multiple execution engines. Such configurations are emerging as a common paradigm used to combine analysis of unstructured data with analysis of structured data (e.g., NoSQL plus SQL). We focus on the usability of the platform by users with various expertise, the automation of the analysis and optimization of execution, as well as the effect of optimization on workflow execution. The platform performs also multi-workflow optimisation and workflow recalibration. The talk will finish with some plans for future research on data management optimization on hybrid infrastructures, i.e. infrastructures that comprise multiple sites heterogeneous parts and combine private clusters and public resources.
|Bio:||Verena Kantere is an Associate Professor at the School of Electrical Engineering and Computer Science (EECS) in the University of Ottawa (UOttawa). Before, she was an Assistant Professors at the School of Electrical and Computer Engineering (ECE) of the National Technical University of Athens (NTUA) and a Maître d’Enseignement et de Recherche at the Centre Universitaire d’ Informatique (CUI) of the University of Geneva (UniGe). She has been working towards the provision of data services in large-scale systems, like cloud systems, focusing on the management of Big Data and the performance of Big Data analytics, by developing methods, algorithms and fully fledged systems. Before coming to the UniGe she was a tenure-track junior assistant professor at the Department of Electrical Engineering and Information Technology at the Cyprus University of Technology (CUT). She has received a Diploma and a Ph.D. from the National Technical University of Athens, (NTUA) and a M.Sc. from the Department of Computer Science at the University of Toronto (UofT), where she also started her PhD studies. After the completion of her PhD studies she worked as a postdoctoral researcher at the École Polytechnique Fédérale de Lausanne (EPFL). During her graduate studies she developed methods, algorithms and fully fledged systems for data exchange and coordination in Peer-to-Peer (P2P) overlays with structured and unstructured data, focusing on the solution of problems of data heterogeneity, query processing and rewriting, multi-dimensionality and management of continuous queries. Furthermore, she has shown interest and work in the field of the Semantic Web, concerning the problem of semantic similarity, annotation, clustering and integration.|
|Speaker:||Dan Suciu, University of Washington|