Michael Christopher Chong
Commit Level vs. File level Vulnerability Prediction
Helping software development teams find and repair vulnerabilities before they are released and exploited can prevent costs due to loss of data, availability, and reputation. However, while general defect prediction models exist, vulnerability prediction models currently do not achieve high enough prediction performance to be used in industry. Prediction of vulnerabilities in commits and files has been explored by previous work, and while commit-level prediction, at a finer granularity, may offer more useful results, there exists no clear comparison in predictive performance to justify this assumption.
To inform further research in vulnerability prediction, we compare commit and file-level prediction, across 7 projects, using 7 classifiers, for 8 different classifier training dates. We evaluate the performance of each prediction model using ‘online prediction’ for ensuring an evaluation in line with practical usage of the prediction model. We evaluate each model using four different metrics, which we interpret as representing two different practical usage scenarios. We also perform an analysis of the data and techniques for evaluating prediction models. We find that despite achieving a low absolute prediction performance, file level prediction generally tends to outperform commit level prediction, but in a few outstanding cases, commit level performs better.