Seminar: "Hands-On/Hands-Off:  Helping Users work with Data Science Pipelines" by Zhengjie Miao, Duke University

Wednesday, August 24, 2022 10:00 am - 10:00 am EDT (GMT -04:00)

Title: Hands-On/Hands-Off:  Helping Users work with Data Science Pipelines

Speaker: Zhengjie Miao, Duke University

Date:  Wednesday, August 24, 20222

Time:  10:00 am

Zoom:  https://uwaterloo.zoom.us/j/91311933736?pwd=aVh2NTJRTExjdGVyaGZNbUpCNXJsQT09

Meeting ID: 913 1193 3736

Passcode: 506435

Abstract:    

Data science has been reshaping almost every single field in the past decade. While data science is empowering a broad range of users, limited usability of existing data systems makes it difficult for users, regardless of programming experiences, to manipulate, analyze, and understand their data. My research focuses on reducing user burden using a combination of hands-on methods, e.g., providing explanations to help users better understand their data and pipeline, and hands-off methods, e.g., automating time-consuming steps such as data collection and data preparation in the pipeline. In this talk, I will present my works along these two lines. The first line of work helps users understand and debug database queries by providing small and representative database instances based on theoretical foundations for data provenance and incomplete data. I also built a practical system that was successfully deployed at Duke University and used by more than 1000 students in introductory database courses. The second line of work leverages machine learning to assist users in data preparation, which has long been recognized as one of the most labor-intensive steps of the data science pipeline. I will conclude with a brief outlook on my future research in helping users build end-to-end pipelines from data to insight.

Biography:  

Zhengjie Miao is a final-year PhD candidate in Computer Science in the database group at Duke University. He is broadly interested in data management and analysis, with a focus on developing algorithms and tools to democratize data science by synthesizing techniques from data management, machine learning, natural language processing, and human-computer interaction. He was a finalist of the Microsoft Research PhD Fellowship.

ALL ARE WELCOME!