The future of history

During Canada’s sesquicentennial year, many of us are looking back to take stock of where we as a country have come from. A few of us, however, are looking forward. How will the events of today, including Canada 150, the Truth and Reconciliation Commission, political turmoil in the United States, and beyond, be archived, researched, and made accessible for future historians – or anyone?

On one hand, things are looking up for future historians: think of the tweets sent on the #Canada150 hashtag, the millions of photographs they can explore on Instagram, the blogs, websites, and other social media sites.

Yet, on the other hand, the sheer quantity of data we generate every single day is overwhelming: In the future, how will we preserve, analyze and retell the 150-anniversary or any other events of our time from this vast quantity of information?

Traditionally, historians have struggled with not having enough information about the past: events happened, nobody wrote anything down or it was not preserved, and as our memories faded the moments never became part of our history. The opposite is increasingly the case: the amount of information left behind by our events threatens to overwhelm our capacity to process it, both technically and intellectually. In short, the shift to born-digital historical sources means that the world around us as historians is rapidly changing.

Our interdisciplinary team, Web Archives for Historical Research, is dedicated to tackling the challenge of big data and how it is reshaping the historical profession. Perhaps the easiest way to understand this challenge is to think about five key dimensions at play.

URL links visualizationThis visualization shows the URL links between Canadian political parties and interest groups between 2005 and 2015, letting us reconstruct whole swaths of our political history.

Scale

We’re experiencing a true shift from historical scarcity, where historians wished we had more information about the past, to one of abundance, where the scale threatens to overwhelm. One example: GeoCities.com, a website hosting service that operated between 1994 and 2009, grew exponentially: from 10,000 users in October 1995 to 1,000,000 in October 1997, to eventually over seven million by 2009 when it was shuttered. Our team counts around 186 million distinct HTML pages on GeoCities’ domain. To me, GeoCities – which is just one small part of the Web (think of other hosting services like Twitter today) – represents the new scale of information that we generate.

Different access methods

Historians have tended to research by reading their sources one page at a time, often with the help of archivists and librarians; the move to online sources upends this traditional access. Increasingly, historians are using keyword searches to find information that they need. This means that we need to understand how these search engines work, or else we won’t be writing our books anymore – the search engine that decides what sources we find will be!

Understanding metadata

If we cannot read every online source, we increasingly need to rely on the data that describes it! This “information about information,” or metadata, has become king in our new world of information. We accordingly rely more on contextual information about our web archives – who links to who, for example, or what kind of words make a page unique or noteworthy – by using computers to find and read the sources for us.

Interdisciplinary approaches

Historians have tended to be solitary animals: we work and publish alone. That needs to change once we are using computers to explore sources at scale: we need to collaborate with others! Our team at Waterloo, for example, includes librarians and computer scientists, working and publishing together to bring order to born-digital cultural heritage.

It's happening sooner than we think

Aren’t the 1990s too recent to do histories of? Our team doesn’t think so. The first histories of the 1960s, for example, appeared in the 1980s. The Web is now over 25 years old, and our web archives go back to 1996. An undergraduate today might want to do a PhD on the Web. We need to be ready to support them!

Ian MilliganHistorians are at a crossroads: we are about to be hit by a rising wave of data. Only through understanding the virtues of interdisciplinary teamwork, the importance of thinking computationally, and by leveraging algorithms, will we be ready to enter this new Web age of history. It is both scary, because it is new territory, but it is also exciting. The next time you tweet on the #Canada150 hashtag, think about it… you might be joining the historical record yourself!

Ian Milligan is a digital and Canadian historian currently exploring how historians can fruitfully use web archives and other large digital repositories.  He leads several research initiatives including UWaterloo's Web Archives for Historical Research group, and he is the principal investigator for a Mellon Foundation funded project, Archives Unleashed.