DSG Seminar Series • Structured Knowledge and Data Management for Effective AI Systems

Monday, December 4, 2023 10:30 am - 10:30 am EST (GMT -05:00)

Speaker: Ihab Ilyas, University of Waterloo

Location: DC 1302

Abstract:

Can structured data management play an important role in accelerating AI? In this talk I focus on two main aspects of structured data management and argue that they are key in powering and accelerating AI application development: (1) automating data quality and cleaning using generative models; and (2) constructing and serving structured knowledge graphs and their role in semantic annotation and grounding unstructured data.

In the first thrust, I will summarize our findings building the HoloClean project. HoloClean builds generative probabilistic models describing how data was intended to look like, and uses them for predicting errors and repairs. On the structured knowledge front, I will describe our work building Saga, an end-to-end platform for incremental and continuous construction of large scale knowledge graphs. Saga demonstrates the complexity of building such platform in industrial settings with strong consistency, latency, and coverage requirements.

I will discuss challenges around building entity linking and fusion pipelines for constructing coherent knowledge graphs; updating the knowledge graphs with real-time streams; and finally, exposing the constructed knowledge via ML-based entity disambiguation and semantic annotation. I will also show how to query such knowledge via vector representation capable of handling hybrid similarity/filtering workloads.

Bio: Ihab Ilyas is a professor in the Cheriton School of Computer Science and the NSERC-Thomson Reuters Research Chair on Data Quality at the University of Waterloo. He is currently on leave as a Distinguished Engineer at Apple, where he leads the Knowledge Graph Platform team. His main research focuses on data science and data management, with special interest in data cleaning and integration, knowledge construction, and machine learning for structured data management. Ihab is a co-founder of Tamr, a startup focusing on large-scale data integration, and he is also the co-founder of Inductiv (acquired by Apple), a Waterloo-based startup on using AI for structured data cleaning. He is a recipient of the Ontario Early Researcher Award, a Cheriton Faculty Fellowship, an NSERC Discovery Accelerator Award, and a Google Faculty Award, and he is an ACM Fellow and an IEEE Fellow.