A group of computer science students are applying big data practices to predicting the end of COVID-19 cases in Canada. Their findings? October 15, 2020.
Faced with this unprecedented situation of pivoting to online learning, many of Waterloo’s instructors wanted to get their students involved in research around COVID-19.
One such instructor is Cheriton School of Computer Science’s Ali Abedi, who teaches two big data courses — Data Intensive Distributed Computing (CS451) and Data Intensive Distributed Analytics (CS431).
Professor Abedi decided that since having his students write in-person final exams was no longer an option, why not have them tackle COVID-19–related problems?
“I thought the students would be motivated to work on a real problem,” Professor Abedi says. “I decided to substitute the final exams with COVID-19–related final projects.”
Xiaoyu Wu, a final-year data science major, believes there was great value in doing a project about a problem affecting society for CS431. For her project, she researched the influencing factors for the COVID-19 infection rates in different countries.
“The project helped me to better understand how I can apply what I learned to real life,” Xiaoyu Wu says. “It not only tested my knowledge and understanding about this course, but also prompted me to combine my previous knowledge and experience with the current course contents to perform practically.”
For CS451, the overall project theme was to predict new cases of COVID-19 in Canada from various big data sources, such as Twitter or Facebook. From there, CS451 undergraduate students had to review the data and build models that predicted the accumulative cases of COVID-19 for the first, second and third weeks of May, individually and May overall. Professor Abedi also required the students to predict when the pandemic would end in Canada.
Gunesh Pinar, who partnered with fellow classmate Ku Young Shin on the project, was delighted that as undergraduate students, they got their first opportunity to do a research project.
“This really benefited us as undergraduate students to get a real taste of what graduate studies would be like,” Pinar says. “The topic of this research was also very motivating because it is a problem that literally concerns everyone in the world. If we could find any little thing that would help the population, we would be very happy.”
Gunesh Pinar and Ku Young Shin’s project saw them analyze more than 180 million tweets using big data technologies such as Apache Spark and Hadoop on the Waterloo-provided servers. In particular, Tweets that contained social distancing–related phrases like “stay home,” “contact tracing” and “herd immunity” interested them. Their hypothesis was that if they could see how many people are promoting social distancing on social media, then they could predict the increase or decrease in the number of new coronavirus cases in the upcoming days.
“After applying a weighted formula engineered using the phrases found in tweets to our prediction graph, we discovered that promoting social distancing on Twitter had reduced the number of new cases in Canada after an incubation period,” Ku Young Shin says, who will move from co-op student to full-time employee with his co-op employer in June. “Using our machine learning model, we were also able to predict that there will be zero new cases of COVID-19 in Canada on October 15, 2020.”
Abedi has graded the projects of the approximately 80 students in his CS451 class based on the methodology they used and their predictions for the first week of May. However, at the end of May, he will change the grades for the students who correctly predicted the number of new COVID-19 cases for May to 100 per cent.