You are here

How to (partially) automate literature reviews for knowledge discovery and synthesisExport this event to calendar

Monday, April 8, 2019 (all day) to Tuesday, April 9, 2019 (all day)

It is more challenging than ever for researchers to be on top of all the latest publications and other developments in their areas of research, regardless of their career stage or the fields they work in. One of the main challenges is the exponential growth of publications. In nearly all fields, there is more work being published every year than any one researcher, or research team, can possibly read and synthesize. This is especially challenging for interdisciplinary researchers, junior researchers, or researchers who are moving into new areas because it can take a fairly long time to get the lay of the land in an unfamiliar literature. This makes research and discovery more difficult than it needs to be for teams and individual researchers, and it holds back the development and communication of collective knowledge. 

Talking to colleagues and mentors, reading the latest articles in the top-ranked journals, going to conferences, and building diverse research teams are all indispensable strategies for keeping on top of the literature and for discovering and synthesizing new knowledge. However, these strategies (1) are often costly and slow, and (2) are generally biased, though not always in negative ways.

This workshop will cover another set of tools from network science, text mining, and scientometrics that can help us rapidly get up to speed on the state of knowledge in a field, and to mine existing knowledge to identify promising areas for discovery and further research. It will provide tools to help you answer a broad range of questions about literatures that we have varying degrees of familiarity with. For example: 

  • What are the groundbreaking papers, and to what extent are they still informing the latest research?
  • What are the central concepts, theories, data sources, and methods?
  • Which topics get the most attention, how are they related to one another, and how have they evolved over time?
  • Which topics are controversial or contested and which are not? Which are currently at the "cutting edge?"
  • Which existing ideas, theories, and methods would make interesting combinations and could push inquiry in new directions?
  • What are the current "knowledge gaps?" Which gaps might be worth addressing?

This hands-on workshop will cover methods from network science, text mining, and scientometrics to help researchers answer these questions and many others. While the methods we will cover have their own unique limitations, they are (1) inexpensive, (2) fast, (3) valid, (4) reliable, and (5) can be easily automated.

The workshop will cover the following topics over three days. Each day will include hands-on time exploring these topics.


  1. Introduction
  2. Collect and manage publication metadata from electronic databases (Web of Science, Scopus, and PubMed)
  3. Parse, clean, and explore publication metadata
  4. Cluster publications based on (a) the similarity of their content and (b) how frequently they are co-cited in other publications 
  5. Identify publications that are influential within and across topics


  1. Identify the relationships (or lack of relationships) across topics
  2. Identify potentially promising combinations of existing ideas and knowledge
  3. Identify and assess the current relevance of groundbreaking papers by using Reference Publication Year Spectroscopy (RPYS) and related methods
  4. Identify papers and topics at the "cutting edge" of a literature

Software and Assumed Background

This workshop makes extensive use of the programming language Python, including the package metaknowledge, which is developed and maintained by Reid McIlroy-Young (University of Toronto) and John McLevey (University of Waterloo).

Although having some knowledge of Python is an asset, it is not required. I will provide all participants with fully executable code for all topics covered in the workshop. Participants will be encouraged to modify the code to suit their specific interests, but this requires minimal programming knowledge and is not required. If you want to learn a bit of Python before the workshop, we highly recommend selecting something from DataCamp.

Participants will be provided with detailed instructions of what software to install and how to install it a couple of weeks before the start of the workshop.

Register for the Workshop 

To register for the workshop, please complete this short form. We will be in touch at a later date about processing your registration payment. 

Space is limited, so we encourage you to register as soon as possible. 

Instructor & Workshop Organizer

John McLeveyJohn McLevey is an Assistant Professor in the Department of Knowledge Integration (Faculty of Environment) at the University of Waterloo. He is the Principal Investigator of a computational social science and social networks research lab called NETLAB, which is funded by grants from the Social Sciences and Humanities Research Council of Canada and an Early Researcher Award from the Ontario Ministry of Research and Innovation. 

John primarily works in the areas of computational social science and social network analysis, with substantive interests in environmental social science, the sociology of science, social movements, and cognitive social science. As a computational social scientist, his most general research goal is to advance our knowledge of how social networks and institutions affect collective cognition and behaviour, including the formation and diffusion of knowledge, beliefs, biases, and behaviours. He is currently involved in a number of research projects in service of that larger goal, including work on the effects of cognitive diversity and homophily in scientific networks, environmental governance conflicts in coastal regions, mobilization into environmental activism, and the diffusion of educational innovations. He is currently writing a book on computational social science for Sage's research methods series. He designed and developed the metaknowledge package with his former student Reid McIlroy-Young.


This workshop is held in partnership with the Department of Knowledge Integration, the Faculty of Environment at the University of Waterloo, and NETLAB

KI logo

Food and Accomodations 

Coffee, tea, and snacks will be provided during the workshop. There are a variety of options for lunch and dinner on campus or within a short walk from campus. 

We will follow up with travelling participants about options for local accomodations. 

$400 CAD, $200 CAD for students.
EV1 - Environment 1
200 University Ave West
Waterloo, ON N2L 3G1

  1. 2019 (3)
    1. April (3)