Software Projects

My students and I develop scientific software in support of our computational social science research. The packages we develop and contribute to have evolved over the years, depending on what problems we face in my various research projects. You can find information about the software we have developed here. 

Maintained or Under Active Development


metaknowledge is a full-featured Python package for doing computational research on science and knowledge. It was designed and developed by John McLevey and Reid McIlroy-Young. Jillian Anderson, Tyler Crick, and Rachel Wood have also contributed and to varying degrees are involved in maintaining the package. 

divsim (Diversity and Similarity in Social Networks)

divsim (Diversity and Similarity in Social Networks) is a Python package implementing measures of node diversity and similarity in social networks. It is especially useful for research on integrated social and belief networks, or for analyzing belief / knowledge / information / cultural homophily in social networks. The measures it implements are described in the working paper "Diversity and Similarity in Social Networks" by John McLevey, Alexander Graham, Tyler Crick, and Pierson Browne.

nate (Network Analysis with Text)

nate (Network analysis with text) is a Python package for analyzing text data using methods and models from network analysis. It is designed, developed, and maintained by John McLevey. 

pdpp (Principled Data Processing, Python)

pdpp (Principled Data Processing, Python) is a Python package that facilitates best practices for reproducible research. It is based on the principles outlined by Patrick Ball of the Human Rights Data Analysis Group. 


tidyextractors is a Python package that makes extracting data from supported sources (e.g. email mbox files, source code log files) as painless as possible, delivering you a populated Pandas DataFrame in just a few lines of code.

Contributions to Existing Open Source Scientific Software Projects


Contributions to Jonathan de Bruin’s Python package recordlinkage: A Python package for linking records across multiple data sources when there is no unique ID available. NetLab contributions are focused on implementing new comparison and fusion algorithms that are necessary for advancing my grant-funded empirical research on the structure and evolution of cross-sectoral collaboration networks in science and technology. Most NetLab contributions to recordlinkage were implemented by Joel Becker (RA) and occasionally Jillian Anderson (RA), and then submitted to Jonathan de Bruin as pull requests.

No Longer Maintained 


gitnet is Python package for mining and analyzing raw data from Git repositories. It has since been replaced by tidyextractors and is no longer maintained by NETLAB.