DSG Seminar Series • MacroBase: Prioritizing Attention in Fast Data

Tuesday, May 2, 2017 10:30 am - 10:30 am EDT (GMT -04:00)

Patrick Valduriez
Inria and Biology Computational Institute (IBC)

Abstract: The blooming of different cloud data management infrastructures, specialized for different kinds of data and tasks, has led to a wide diversification of DBMS interfaces and the loss of a common programming paradigm.

In this talk, we present the design of a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store’s native query interface. The query engine has a fully distributed architecture, which provides important opportunities for optimization. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping.

Our experimental validation, with various data stores (graph, document, relational, Spark/HDFS), and representative queries, shows that CloudMdsQL satisfies the five important requirements for a cloud multidatastore query language.

This work partially funded by the European Commission under the Integrated Project CoherentPaaS.

Presentation slides (PDF)

Video of presentation (mp4)

BioPatrick Valduriez is a senior researcher at Inria and LIRMM, University of Montpellier, France. He has also been a professor of Computer Science at University Paris 6 and a researcher at Microelectronics and Computer Technology Corp. in Austin, Texas. He received his Ph. D. degree and Doctorat d'Etat in CS from University Paris 6 in 1981 and 1985, respectively. He is the head of the Zenith team (between Inria and University of Montpellier, LIRMM) that focuses on data management in large-scale distributed and parallel systems (P2P, cluster, grid, cloud), in particular, scientific data management.

He has authored and co-authored over 250 technical papers and several textbooks, among which “Principles of Distributed Database Systems”. He currently serves as associate editor of several journals, including the VLDB Journal, Distributed and Parallel Databases, and Internet and Databases. He has served as PC chair of major conferences such as SIGMOD and VLDB. He was the general chair of SIGMOD04, EDBT08 and VLDB09. He obtained the best paper award at VLDB00. He was the recipient of the 1993 IBM scientific prize in Computer Science in France and the 2014 Innovation Award from Inria – French Academy of Science – Dassault Systems. He is an ACM Fellow.