Professor Ihab Ilyas of the Cheriton School of Computer Science has been named a 2024 Fellow of the Royal Society of Canada, the highest national recognition for researchers in the arts, humanities, social sciences, and sciences. He is among 104 distinguished individuals across Canada recognized this year for their exceptional scholarly, artistic, and scientific achievements.
“Congratulations to Ihab on becoming a Fellow of the Royal Society of Canada,” said Raouf Boutaba, University Professor and Director of the Cheriton School of Computer Science. “Over his career, he has made significant contributions in ranking, in machine learning for large-scale data linkage, and in generative AI systems for automatic data cleaning, as well as co-founding two successful start-up companies.”
Professor Ilyas’s research and contributions
Professor Ihab Ilyas has made outstanding contributions to data management, in particular on data integration, data cleaning, and rank-aware query processing. His research has made a significant impact on both academia and industry, leading to practical applications that address real-world challenges.
His key contributions include the following —
- Rank-aware query processing: Professor Ilyas pioneered the field of rank-aware query processing for accelerated retrieval of important information from large database systems.
- Handling uncertain and probabilistic data: He demonstrated how to deal with large uncertain and probabilistic data encountered in real-world applications.
- Automatic data cleaning using generative AI: He pioneered the field of automatic data cleaning using generative AI models at scale as an open-source platform and as a commercial product.
- Scalable data integration systems: He built data integration systems that scale to hundreds of thousands of heterogenous data sources, serving large enterprises.
- State-of-the-art knowledge graphs: Professor Ilyas designed and built a state-of-the-art knowledge graph platform that serves hundreds of millions of users in production.
In his early work, Professor Ilyas made fundamental contributions to developing efficient techniques for evaluating rank-aware queries over large databases. Rank-aware querying is important in modern applications involving multi-objective optimization. Some examples include finding the top 10 products that satisfy user preferences or showing the most relevant videos or images similar to an example image. Retrieving the most relevant information to users from traditional database systems is an expensive and a complex process. Professor Ilyas pioneered the integration of rank-aware querying into database technologies to enable effective and efficient retrieval from large data sets. He developed algorithms and techniques that substantially improved how database systems handle ranking and user-preferences in processing queries.
His Rank-Join algorithm is the state-of-the-art approach to produce query answers ranked on user preference. He has introduced RankSQL, the first end-to-end rank-aware query engine based on novel ranked relational algebra semantics. He extended this work to ranking over uncertain data and provided the first meaningful semantics for the interplay between uncertainty and score-based ranking. He has also addressed uncertainty in the ranking function itself. His publications defined a new line of research and have provided great insight and several practical semantics of how to produce the most probable top-k records with respect to user preferences. His related work on high-dimensional spatial indexing has been implemented in PostgresSQL, the world’s most advanced and used open-source relational database engine.
Professor Ilyas is a world leader on data quality, focusing on scalable automatic error detection, cleaning and imputation of dirty structured data. Dirty, incomplete, and inconsistent datasets are common in big data and data science and are major impediments to progress in data analytics in which insights are drawn from data. The problem has been identified as the main hurdle for data science and costs the world’s economy billions of dollars annually.
Professor Ilyas pioneered the area of data cleaning by automatically discovering complex integrity constraints from raw data sets and incorporating this domain knowledge into state-of-the-art machine learning and generative AI models. He addressed multiple technical challenges, among them lack of training data, translating traditional integrity constraints into model features, bridging the gap between logical and probabilistic data cleaning, and handling the sparsity and scale challenges in running machine learning on big relational data. The work presents data errors as a noisy channel with a probabilistic model to generate original clean data, and a probabilistic realization model that pollutes that data. Several key results on mining the constraints and on the learnability of these machine learning models parameters using only the observed dirty data helped create pragmatic and scalable solutions. Key insights also include how violations of business rules and integrity constraints can be incorporated into these machine learning models, which allowed decades of logical cleaning research to be incorporated in modern and scalable techniques. In a series of papers, Professor Ilyas, his team and collaborators developed highly novel solutions, demonstrating their efficacy and applicability in building usable systems adopted by large enterprises.
This work produced a rich open-source prototype system called HoloClean and led to Inductiv, a start-up that was acquired by Apple. Inductiv’s technology has been incorporated into multiple data processing pipelines at Apple that power key analytics and user experience enhancing tools. In addition to their industry impact, the top four publications on HoloClean have significantly catalyzed scientific follow-on work.
Professor Ilyas has also made important contributions in large-scale data integration. Information about the same real-world entity — for example, a product, a major event, or a song — come from a variety of heterogenous sources in both structured and unstructured forms. These sources might present contradictory aspects and be in different formats and schemas. Matching these large number of sources to a common representation and resolving and repairing conflicting information are at the heart of the data integration challenge. Professor Ilyas’s fundamental contributions in data integration were commercialized in Tamr, another start-up that has been used by Fortune 500 companies.
Professor Ilyas recently led a major data integration effort at Apple, building the state-of-the-art knowledge graph platform known as Saga. His work integrated data from a variety of external and internal sources to build the source of truth for all world major entities. The integrated knowledge platform runs in production powering products and user experience enhancing tools used by hundreds of millions of users.
Fellows of the Royal Society of Canada at the Cheriton School of Computer Science
Professor Ilyas is the tenth faculty member at the Cheriton School of Computer Science to be named a Fellow of the Royal Society of Canada. Previous recipients of Royal Society of Canada Fellowships are N. Asokan, Raouf Boutaba, Richard Cleve, J. Alan George, Srinivasan Keshav, Ming Li, J. Ian Munro, M. Tamer Özsu, and Douglas Stinson.
Royal Society of Canada
Founded in 1882, the RSC comprises the Academy of Arts and Humanities, Academy of Social Sciences, Academy of Science and the RSC College. The RSC recognizes excellence, advises the government and society, and promotes a culture of knowledge and innovation within Canada and with other academies around the world.