Please note: This master’s thesis presentation will be given online.
Nalin De Zoysa, Master’s candidate
David R. Cheriton School of Computer Science
GitHub is a collaborative platform that is used primarily for the development of software. In order to gain more insight into how teams work on GitHub, we wish to analyze the sentiment content available via communication on the platform.
In order to do so, we first use existing sentiment analysis classifiers and compare the GitHub data to other social networks, Twitter and Reddit. By identifying that users are able to provide reactions to other users posts on GitHub, we use this as an indicator or label of sentiment information. Using this we first investigate whether repeated user interaction has an impact on sentiment and find that it is positively correlated to the amount of prior interaction as well as the directness of interaction. We also investigate if metrics corresponding to a user’s status or power in a project correlate with positive sentiment received and find that it does.
We then build sentiment classifiers using both textual and non-textual information, both which outperform the generic sentiment scorer systems. In addition we show that a sentiment classifier built using only non-textual information can perform at a comparable level to a text-based classifier, indicating that there is significant sentiment information contained in non-textual information in the GitHub network.