About the web archives for historical research group

Our project

a link visualization of a million links from the CommonCrawlBig Data is reshaping the historical profession and society in ways we are only now beginning to grasp. Tremendous new opportunities are opening up for social and cultural historians. Large web archives contain billions of webpages, from personal homepages to professional or academic websites, offering the ability to reconstruct large-scale aspects of the recent past. Yet the sheer size of these primary sources presents significant challenges: if the norm until the digital era was to have human information vanish, “now expectations have inverted. Everything may be recorded and preserved, at least potentially” (as James Gleick noted in his 2012 book, The Information: A History, a Theory, a Flood). Useful historical information is being preserved at a rate that continues to accelerate. IBM Research, for example, notes that “every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone” (“IBM - What Is Big Data?” 2014).

This project fits into that context by drawing on the large web archives held by the Internet Archive.

The Web Archives for Historical Research (WAHR) group's goal

Beyond providing people with the ability to track temporal change within web archives, and to train students in this emerging area, the research has several purposes. 

  • Raising public awareness about the utility of web archives. We aim to firmly demonstrate that the Web is a serious historical resource.
  • Engage with new media as historians, and experiment with new forms of knowledge dissemination (for historians, at least).
  • Provide public outreach through partnering with a local secondary institution to raise awareness about digital records and memory.

This project is among the first attempts to harness data in ways that will enable present and future historians to usefully access, interpret, and curate the masses of born-digital primary sources that document our recent past.