What’s in a developer’s name?

Wednesday, July 28, 2021

In one of the most memorable speeches from William Shakespeare’s play, Romeo and Juliet, Juliet ponders, “What’s in a name? That which we call a rose by any other name would smell as sweet.” Her message is clear — things are what they are no matter what name they are given. 

But what if your name, or more precisely what people perceive about you from just your name, affected how your contributions are viewed and valued? 

Research conducted by recent CS master’s graduate Reza Nadri, recent postdoctoral researcher Gema Rodríguez-Pérez, and their supervisor Cheriton School of Computer Science Professor Mei Nagappan found that the perceived race and ethnicity of a developer — based on just their user name — can affect how the developer’s contributions to open source software projects are evaluated.

Reza Nadri, Gema Rodríguez-Pérez, and Mei Nagappan

L to R: Reza Nadri, Gema Rodríguez-Pérez, and Cheriton School of Computer Science Professor Mei Nagappan

In GitHub, an online platform for software development where developers store their open source software projects and work with other developers, the technical quality of a coder’s contributions is unquestionably important. And this collaborative software development community has long viewed itself as a meritocracy, one where quality of code is paramount and decisions to accept or reject contributions are based solely on technical excellence.

“A developer’s contributions to an open source software project are accepted or rejected for a variety of technical reasons, but our analysis of tens of thousands of projects on GitHub shows that contributions can be accepted or rejected because of other factors,” said Professor Nagappan. “We found that one of them is the perceived race and ethnicity of a developer based on the person’s name on the platform.”

“GitHub discussions are online and all people see is a name,” adds Dr. Rodríguez-Pérez. “In an open source software development context, it’s likely that people have discussed their contributions only through the pull request system, the system on GitHub to propose and collaborate on changes in a software repository. The submitter — the developer who submits a pull request — and the integrator — the project developer who evaluates the pull request — have likely never met in person, had a Zoom call, or a watercooler discussion. They know only their user names in GitHub. We wanted to see whether the perceived race and ethnicity of a developer, based on the person’s name, affects how their software contributions are evaluated.”

To better understand the racial and ethnic diversity in open source software projects, the research team conducted a large-scale analysis of projects on GitHub. They examined more than two million pull requests across more than 37,700 open source projects that involved nearly 366,000 developers across the globe. Importantly, this research is the first study to examine whether the perceptible race and ethnicity of developers has an influence of the evaluation of their contributions in open source software projects.

A developer can choose what name they want to use on GitHub, so not surprisingly some names were removed from the analysis because they weren’t ones from which an ethnicity or race could be perceived, Professor Nagappan said. “For example, Wiki1-2-3 could be your GitHub name, but it is not a name with a perceptible race and ethnicity. Such names were removed from the dataset and were not in our analysis.”

The researchers then estimated the race and ethnicity of developers based on their GitHub names using NamePrism, a nationality and ethnicity classification tool that has been trained on a set of 74 million labelled names from 118 countries. NamePrism is the state-of-the-art tool that estimates what is the likely perceived race and ethnicity by others when all that’s available to them is a name.

“It’s important to say that we do not know the actual race and ethnicity of the developers in an open source software project from just their names,” Dr. Rodríguez-Pérez said. “But there is also a high likelihood that nobody on GitHub knows the developer’s race and ethnicity either. Submitters and integrators perceive whatever they perceive by looking at the person’s name. It is possible that a person is of a different race and ethnicity than what they are perceived to be from just their name, but that’s all they have to go by — their perception — and interactions are based on our perceptions.”

Of the names on which NamePrism could assign a perceptible race and ethnicity, 70 percent of the integrated pull requests were submitted by developers who were perceptible as White, about 5 percent who were perceptible as Asian, 3 percent perceptible as Hispanic, and less than 0.1 percent perceptible as Black. The balance, about 22 percent of contributions integrated into projects, were submitted by developers whose name and ethnicity could not be determined by the tool.

“The majority of contributions that were integrated were submitted by developers perceptible as White,” Professor Nagappan said. “Developers who were perceptible as Asian, Hispanic and Black had less than 10% of the contributions in total that were accepted to open source software projects. This low percentage is concerning because it does not reflect the percentage of people who comprise these racial groups or the percentage of developers in those groups in the overall tech community.”

This underrepresentation of large proportions of the human population is concerning and it could have unwanted consequences for the open source software community, Dr. Rodríguez-Pérez said. “Not only might it cause a lack of diverse and important contributions from being incorporated into open source projects, but it may also deter non-White developers from contributing to them, making open source software development predominantly by and for White developers.”

Importantly, the researchers also found that the odds of a contribution being accepted by project integrators was lower from developers who are perceptibly non-White. 

“Perceptible Hispanic and Asian developers had 6 to 10 percent lower odds of getting their pull requests accepted compared with perceptible White submitters,” said Dr. Rodríguez-Pérez. “Our findings strongly suggest that the open source software community should investigate ways to foster a more diverse community and identify barriers that may be preventing non-White developers from participating in open source projects.”

Moreover, the researchers also found that contributions from submitters with perceptible non-White races and ethnicities were more likely to have their pull requests accepted when the integrator is from the same race and ethnicity rather than when the integrator is estimated to be White. In other words, contributions from perceptible Hispanic submitters were more likely to get their pull request accepted when the integrator is estimated as Hispanic. Contributions from perceptible Asian submitters were more likely to be accepted when the integrator is estimated as Asian. And contributions from perceptible Black submitters were more likely to be accepted when the integrator is estimated as Black.

“The integrator is the person who accepts or rejects the contribution, and here their race is not perceived” Professor Nagappan explains. “Instead, we use the term estimated. The NamePrism tool estimates their ethnicity, but integrators obviously know what race and ethnicity they are. They know their ethnicity, but we don’t know that. We can estimate what their ethnicity might be only from their name. And it’s important to point out that this is a one-way power dynamic. It’s not as though contributors choose which races or ethnicities of integrators to submit to.”

Much research has shown that a diverse workforce — by race, gender, sexual orientation and gender identity, personality traits and age, among many other measures of diversity — is beneficial beyond ethical reasons. Diversity brings different, important perspectives to a problem, leads to more robust software products, and creates more efficient and productive teams. 

A survey of the open source software community conducted in 2017 found that about half of the respondents said that their contributions to open source projects were a crucial factor in launching their professional careers, Dr. Rodríguez-Pérez said. “Participation in software development platforms such as GitHub are an important career entry point for many developers. Given its importance, open source software communities should avoid any possible discrimination against developers, particularly if it excludes or discourages them from participating.”

“And we’re not saying that a developer’s contributions are being rejected outright because of the person’s race and ethnicity, that there’s explicit bias, or that the open source community is racist,” Professor Nagappan said. “We are saying that our analysis of a large dataset of contributions on GitHub shows that the perceptible races of developers do not reflect those of the general population and that the odds of getting a contribution accepted is lower if a contributor is perceptible as non-White.”

“We need to tackle that problem and that’s a topic for future research,” Dr. Rodríguez-Pérez adds. “We need to identify the problems, understand why the problems exist, and determine what interventions can help reduce and eliminate bias. Being even bolder, we can look at what policies we should enact to prevent bias. Nothing prevents us from deciding that contributions on GitHub should be evaluated by at least two integrators, or that all contributions are first evaluated anonymously. In the end, projects should actively encourage contributions from a more diverse open source software community of developers.”


To learn more about the research on which this feature article is based, please see Reza Nadri, Gema Rodríguez-Pérez, Meiyappan Nagappan. On the Relationship Between the Developer’s Perceptible Race and Ethnicity and the Evaluation of Contributions in OSS. IEEE Transactions on Software Engineering, April 16, 2021. DOI: 10.1109/TSE.2021.3073773. 

Please address questions to Professor Mei Nagappan at mei.nagappan@uwaterloo.ca.

Please also see the media release issued by the University of Waterloo: https://uwaterloo.ca/news/media/research-examines-how-race-affects-judgements-software

Coverage by the media

Software developers' perceived race may affect success of their proposals: study” by Tara Deschamps of The Canadian Press, Thursday, July 29, 2021

Want to succeed on GitHub? Your odds are better if you’re white” by Issie Lapowsky, Protocol, Wednesday, August 4, 2021.