Revisiting Google Ngram: What we can learn from Corpora

Friday, November 25, 2016
by Ryan Patrick Welch

So I’ve been thinking about Google’s Ngram Viewer and how it applies as a teaching tool. Although it doesn’t directly translate, Ngram reminded me of a really handy tool that we use regularly at the Writing Centre. I’ve also realized that many people may not have encountered it before. So, I present to you: the Now Corpus.

What is it? It’s similar to the Google Ngram Viewer, as it’s a huge archive of web-based publications that you can use to search words or phrases. However, rather than only showing you the number of instances the word has occurred, the Corpus shows how the word is used in a variety of contexts. What’s makes the database even cooler is that it contains 3.6 billion words of data, and is growing at a rate of 4-5 million words per day. That’s absurd.  

Dr. Evil image with caption
           Mike Myers, apparently counting how many characters he's                           played in his own films. From Quickmeme.com

So yeah, it may function a bit differently from the Ngram Viewer, as the Corpus keeps you up to date with contemporary usage of words rather than historical usage. Regardless, both corpora give you a sense of how the meaning of a word is fluid, shifting around based on context and common usage.

You may be asking yourself how the Corpus works as a teaching tool. It’s actually quite intuitive: people use it much like a dictionary and thesaurus put together. If they’re unsure about how people are using a specific term—especially in a specific context—the Corpus allows you to get some examples of how other people are using it in public discourse. From experience, learning the bare-bones definition of a word isn’t the hard part of building vocabulary. Instead, developing the agility to use any given word in certain situations is much trickier (and more useful). The Corpus effectively gives you access to an enormous range of examples to learn from.

The other beauty of the Corpus is how it helps non-native English speakers expand their vocabulary in a simple, efficient way. A lot of parts of speech—such as prepositions and articles—are very context-driven. Therefore, getting a sense of where certain words are used and in what context is extremely useful for non-native English speakers.  

So give it a shot! Plug some words that you commonly use into the Corpus and see how other writers are using them. This is particularly useful if you’re taking a course in a different faculty or department, where you’re less familiar with the analytical language used in that discipline. Much like the thought experiment with the Ngram viewer from before, the Corpus illustrates how we have a wide range of learning tools at our disposal; the key to developing effective writing and revising strategies is how we utilise these tools at various stages of the writing process.