Cheriton School of Computer Science Professor Ihab Ilyas has been named a Fellow of the Institute of Electrical and Electronics Engineers for his contributions to data integration, data cleaning and rank-aware query processing.
IEEE Fellowships are a prestigious professional recognition and an important career achievement. A Fellow is the highest grade of IEEE membership and it is conferred to those with an outstanding record of accomplishments. Each year, the total number of IEEE members recognized as Fellows does not exceed one-tenth of one per cent of the Institute’s total voting membership.
“Congratulations to Ihab on being named an IEEE Fellow,” said Raouf Boutaba, Professor and Director of the Cheriton School of Computer Science. “Ihab is well known to computer scientists in both academia and industry because of his contributions to scalable automatic error detection, to data cleaning, and to imputation of dirty structured data. He has pioneered scalable automatic data cleaning through the systems he, his students and collaborators have developed that use state-of-the-art machine learning models.”
Professor Ilyas is the seventh faculty member at the Cheriton School of Computer Science to receive the prestigious recognition of IEEE Fellow, following Professors N. Asokan, Raouf Boutaba, J. Alan George, Ming Li, M. Tamer Özsu, and Srinivasan Keshav who is an adjunct at the Cheriton School of Computer Science and is now at Cambridge.
IEEE is the world’s largest technical professional organization dedicated to advancing technology for the benefit of humanity. IEEE and its members inspire a global community through its highly cited publications, conferences, technology standards, and professional and educational activities.
More on Professor Ilyas’s contributions
Scalable automatic error detection, cleaning and imputation of dirty structured data
Professor Ilyas has addressed multiple technical challenges in scalable automatic error detection, cleaning and imputation of dirty structured data. His research presents data errors as a noisy channel with a probabilistic model to generate original clean data, and a probabilistic realization model that pollutes that data. Several key results on mining the constraints and on the learnability of these machine-learning models’ parameters using only the observed dirty data have helped to create pragmatic and scalable solutions. These machine-learning solutions incorporated modern techniques such as self-supervision, data augmentation, embedding, and schema-level attention mechanisms to build learnable complex error detection and repair models. Key insights include how violations of business rules and integrity constraints can be incorporated into these machine-learning models, which has allowed decades of logical cleaning research to be incorporated in modern and scalable techniques.
Professor Ilyas’s research in data cleaning has been recognized by academia and industry alike. He and his former PhD student Xu Chu, now faculty at Georgia Tech, coauthored Data Cleaning, among the most downloaded books in the ACM Books series. He has given many invited keynote addresses and presentations at top institutions and venues. His start-up Tamr, among the top companies in data integration and preparation, has raised more than $70 million and serves dozens of Fortune 500 companies. And Inductiv, Professor Ilyas’s start-up that uses machine learning to automate the task of identifying and correcting errors in data, was acquired by Apple Inc. in 2020. Inductiv’s technology is based on HoloClean, a next generation of machine-learning techniques to clean data that began in 2017 as a collaborative academic project led by Professor Ilyas and his colleagues Professors Theodoros Rekatsinas at the University of Wisconsin-Madison and Christopher Ré at Stanford University.
Rank-aware query processing
Professor Ilyas has also integrated rank-aware querying into database technologies to allow effective retrieval in large data sets such as those in multimedia and video databases. He developed algorithms and techniques that markedly changed how database systems handle ranking and user-preferences in processing queries.
His rank-join algorithm has been the state-of-the-art physical query operator to produce query answers ranked on user preference in a way that avoids computation of the entire answer sets. He introduced RankSQL, the first end-to-end rank-aware query engine based on novel ranked relational algebra semantics and built on top of PostgreSQL. In addition, he first introduced the problem of ranking uncertain data and provided the first meaningful semantics for the interplay between uncertainty and score-based ranking. He and his research group were the first to define the problem of ranking where either the record membership or the score values — or both — are uncertain. Professor Ilyas has also addressed uncertainty in the ranking function itself. His many papers in this area have defined a new line of research in the database community and have provided valuable insight and several practical semantics of how to produce the most probable top-k records with respect to user preferences.