Speaker: Marco Serafini, University of Massachusetts Amherst
Abstract: Many advanced data science applications, from social networks to knowledge bases and data integration, analyze complex, high-dimensional, connected data, which is often modeled as a graph. Rather than flattening out connections into a tabular form, these applications treat them as first-class citizens. Applications that mine and navigate connected data push the envelope of traditional data analytics systems, both relational and graph-native, in similar ways. This talk will argue that better system support for connected data ultimately benefits both graph and relational analytics. It will discuss some of these dimensions at different levels of the system stack: from storage systems to large-scale cloud execution platforms, from data analysis algorithms that efficiently deal with large intermediate results to new high-level APIs for emerging applications.
Bio: Marco Serafini is an Assistant Professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst. His research interests are in data management system and distributed systems. His work has impacted popular open-source systems such as Apache Zookeeper and Apache Storm. He designed Arabesque, a system for distributed graph mining. Before joining UMass, he worked at the Qatar Computing Research Institute and Yahoo! Research.