Internet search engine queries and social media data can be early warning signals, creating a real-time surveillance system for disease forecasting, says a recent University of Waterloo study.

Using the example of COVID-19, researchers found there was an association between the disease’s prevalence and search engine queries and social media posts.

“The general public tends to use internet searches and social media for health information, and especially so during global epidemics,” said Dr. Yang (Rena) Yang, a postdoctoral research fellow in the School of Public Health Sciences at Waterloo. “These behaviour patterns can be used by public health authorities to develop a real-time surveillance system to flag when diseases are spiking or waning or respond quickly to emerging infectious diseases.”

The team extracted symptom keywords from Google Trends and Twitter data in Canada from January to March 2020. These keywords included cough, runny nose, sore throat, shortness of breath, fever, headache, body ache, and fatigue on Google Trends. On Twitter, researchers looked at COVID-19-related hashtags, such as pneumonia, cough, fever, running nose and breath. They then cross-checked the information against COVID-19 data from the COVID-19 Canada Open Data Working Group.

The researchers found that search terms related to COVID-19 symptoms strongly correlated with daily COVID-19 cases with a time lag of between one and 13 days, suggesting that these tools can serve as early warning signals for digital disease surveillance in real time. The sophisticated machine learning model used for forecasting in this study performed better with Google Trends than with Twitter data.

Dr. Zahid Butt, lead investigator of the study and an assistant professor in the School of Public Health Sciences at Waterloo, noted there are challenges in modelling due to the noise from self-generated data, not to mention the ability to identify relevant keywords of an emerging infectious disease.

“Our future research will aim to systemically identify and organize pertinent symptom keywords for emerging diseases, even before they are commonly recognized or reported,” Butt said. “These systems have the potential to assist in epidemiological control and monitor

public perceptions of the disease, as well as forecast trends in outbreaks. A multifaceted strategy that uses multiple data sources and multimodal modelling would help provide accurate and comprehensive emerging disease surveillance.”

The study, Digital Disease Surveillance for Emerging Infectious Diseases: An Early Warning System Using the Internet and Social Media Data for COVID-19 Forecasting in Canada, appears in Studies in Health Technology and Informatics and was authored by Waterloo's Dr. Yang (Rena) Yang, Shu-Feng Tsao, Mohammad Basri, Dr. Helen Chen and Dr. Zahid Butt.