Please note: This master’s thesis presentation will take place online.
Owura Asare, Master’s candidate
David R. Cheriton School of Computer Science
Supervisors: Professors Mei Nagappan, N. Asokan
In this thesis, we perform two security evaluations of GitHub’s Copilot with the aim of better understanding the strengths and weaknesses with of Code Generation Tools.
In our first evaluation, we use a dataset of vulnerabilities found in real world projects to compare how Copilot’s security performance compares to that of human developers. In the set of (150) samples we consider, we find that Copilot is not as bad as human developers but still has varied performance across certain types of vulnerabilities. In our second evaluation, we conduct a user study that tasks participants with providing solutions to programming problems that have potentially vulnerable solutions with and without Copilot assistance. The main goal of the user study is to determine how the use of Copilot affects participants’ security performance. In our set of participants (n=21), we find that access to Copilot accompanies a more secure solution when tackling harder problems. For the easier problem, we observe no effect of Copilot access on the security of solutions. We also capitalize on the solutions obtained from the user study by performing a preliminary evaluation of the vulnerability detection capabilities of GPT-4. We observe mixed results of high accuracies and high false positive rates, but maintain that language models like GPT-4 remain promising avenues for accessible, static code analysis for vulnerability detection.
We discuss Copilot’s security performance in both evaluations with respect to different types of vulnerabilities as well its implications for the research, development, testing, and usage of code generation tools.