The art and science of Analyzing Big Data, by Prof Anindya Sen

Wednesday, March 13, 2019

There has been recent discussion on the existence of several different data gaps across economic, social and political divides — deficits that are left unaddressed at our own peril. But there is another deficit that has, I would argue, gone relatively unnoticed but is no less important: Canada’s skills gap in data analysis.

If Canada’s data deficit is to be eliminated, more collaborative learning and engagement between data science and the arts is needed. Much of the problem could be addressed by ensuring people have the skills to know not only how to look for data, but how to interpret them.

I am a professor of economics and the Director of the Master of Public Service program at the University of Waterloo, where I have been conducting policy-oriented research using large datasets for over two decades.

Analyzing Big Data

We currently live in the era of Big Data, where massive amounts of information are being collected at an ever-decreasing cost. Every Facebook, Twitter and Instagram post is a data moment that can be archived and become a part of a historical dataset. In an age where governments are making data more open and accessible, there is a significant demand for employees who can aggregate such large information sets in a meaningful manner and deliver key insights.

In response, many undergraduate and graduate programs in Big Data analysis and data science have emerged in universities across the country. These are typically housed in computer science, mathematics, statistics and engineering departments.

A humanities approach to data

From a policy perspective, a key missing ingredient of many of these programs is limited exposure to social science and humanities courses. This might seem puzzling because why should data science programs require courses in the arts?

The social sciences and humanities train students in the behavioural theories that are required to explain trends in data and extract insightful narratives. This allows arts students to be an integral part of any model-building process aimed at predicting human behaviour and choices.

Given the recent controversy on data collection practices by Facebook, data science students would also benefit from data ethics and governance courses. There is a need for students to understand the importance of individual privacy and confidentiality and data protection, which takes priority over actual data analysis.

On the flip side, arts students should also be encouraged to take challenging courses that offer contemporary Big Data analysis methods. These courses could include machine learning, that are not yet common in the social sciences or humanities curriculum.

Datafests and hackathons

What is further required are ground-level scalable ideas that have the potential to bridge the data divide between the sciences and arts. Datafests and hackathons are becoming increasingly common in many university campuses. In these events, students are typically organized into teams and have roughly two days to analyze data and craft a summary of findings or recommendations. There have been many datafests that have been organized in different universities by the American Statistical Association.

However, the emphasis is typically on mining private sector data. In contrast, Canada’s first policy datafest for graduate students in the arts used public datasets. Hosted by the University of Waterloo, in partnership with Innovation, Science and Economic Development Canada, the Government of Ontario and the Royal Bank of Canada, different open datasets were used to analyze data.

Datafests demonstrate the expertise of humanities and social sciences students with respect to data analytics. Their creativity, critical thinking and diversity in thinking, which are all key components of these disciplines, are important to developing, researching and analyzing policy issues.

At the 2019 University of Waterloo Datafest, most of the research presented was based on open data publicly available from different government websites. The use of open data allows datafests to be low cost ventures with significant returns that result in further awareness of free and easily accessible information. Further, datafests are an experiential education opportunity where students are encouraged to accumulate relevant skills and must work in teams in order to analyze policy issues of contemporary importance.

The seeds are planted in order to produce a critical mass of individuals who are skilled in sophisticated data analytics, can identify data deficits and offer recommendations on how to eliminate them. And this is a key point. The government can try to eliminate the national data deficit by downloading more information on to public websites. However, this will not be efficient or useful if the ability to analyze data and make correct inferences, is not widespread.

Reducing the skills gap

Of course, there should be priority on increased resources to Statistics Canada and to provinces and municipalities, to ensure dedicated offices and personnel who are able to assess data needs across different departments. However, this is a short-term perspective as it does not address the skills gap in data analysis.

A long-term strategy of ensuring a blend of training across different disciplines and encouraging public, private and university partnerships should result in a significantly lower data deficit for Canada by reducing the skills gap in data analytics and encouraging statistical literacy.

Steve Jobs summarized his business strategy by saying: “It is in Apple’s DNA that technology alone is not enough — it’s technology married with liberal arts, married with the humanities, that yields us the results that make our heart sing.” While this was specific to the intersection between technology and the arts, it resonates deeply when one considers how society can further advance by encouraging cooperative learning in data science between the Arts and Sciences.

——

This article was originally published in The Conversation under a Creative Commons license and then republished in the National Post. Disclosure information is available on the original site. Read the original article on The Conversation's website