The following was excerpted from an article published on the website of CS-Can/Info-Can, the nation’s professional society dedicated to representing all aspects of computer science and the interests of the discipline to Canadians.
The first problem was data uncertainty. Professor Ilyas has always been focused on data science, not in a general sense, but to provide high-quality data for analytics. While working on uncertain and probabilistic databases, he and his team discovered that uncertainty mostly comes from dirtiness in data. In the first phase of his research, Professor Ilyas and his team tried to understand uncertainty in the data we collect, and in 2009 started building systems for data cleaning.
Professor Ilyas realized that you need to build a system to get to the deeper problem. The next issue became data profiling, to mine the underlying patterns and constraints that apply to a given data set. The group looked at different types of constraints on the data, how to represent them, and how to force the data to satisfy the constraints. In 2016, the availability of so many machine learning tools triggered a shift in their research. Along with their collaborators, they discovered that the problems they struggled with could be solved using machine learning tools, and a new line of research came into view — data curation as a statistical inference problem.
Leading a problem-focused group made it natural to work with industry. Professor Ilyas started consulting early in his career because of his keen interest in applying his research. Collaborations followed with companies including IBM and Google. With access to an entrepreneurial ecosystem in Waterloo Region, he started collaborating with several tech startups.
With a unique intellectual policy at the University of Waterloo, a vibrant entrepreneurial ecosystem, and an opportunity to team up with a sister group at Massachusetts Institute of Technology, Professor Ilyas co-founded a startup in 2013. He, Michael Stonebraker, and Andy Palmer led a team of researchers working on data mastering at scale and started Tamr. The company provides a way for large enterprises to consume accurate, up-to-date, unified data from several sources. It now employs around 150 people, and the founders have raised more than $70 million USD.