Please note: This PhD defence will take place in DC 1304.
Farshad Kazemi, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Shane McIntosh
First, we empirically explore the validity of the recommended reviewers by measuring the rate of stale reviewers, i.e., those who no longer contribute to the project. We observe that stale recommendations account for a considerable portion of the suggestions provided by CRR approaches, accounting for up to 33.33% of the recommendations, with a median share of 8.30% of all the recommendations. Based on our analysis, we suggest separating the reviewer contribution recency from the other factors used by the CRR objective function. The proposed filter reduces the staleness of recommendations, i.e., the Staleness Reduction Ratio (SRR) improves between 21.44%–92.39%.
While the first study assesses the validity of the recommendations, it does not measure their quality or potential unintended impacts. Therefore, we next probe the potential unintended consequences of assigning recommended reviewers. To this end, we study the impact of assigning recommended reviewers without considering the safety of the submitted changeset. We observe existing approaches tend to improve one or two quantities of interest while degrading others. We devise an enhanced approach, Risk Aware Recommender (RAR), which increases the project safety by predicting changeset bug proneness.
Given the evolving landscape of automation in code review, our final study examines whether human reviewers and, hence, recommendation tools are still beneficial to the review process. To this end, we focus on the behaviour of Review Comment Generators (RCGs), models trained to automate code review tasks, as a potential way to replace humans in the code review process. Our quantitative and qualitative study of the RCG-generated interrogative comments shows that RCG-generated and human-submitted comments differ in mood, i.e., whether the comment is declarative or interrogative. Our qualitative analysis of sampled comments demonstrates that ACR-generated interrogative comments suffer from limitations in the RCG capacity to communicate. Our observations show that neither task-specific RCGs nor LLM-based ones can fully replace humans in asking questions. Therefore, practitioners can still benefit from using code review tools.
In conclusion, our findings underscore the continued necessity of human assistance in the code review process. Thus, we advocate for the improvement of code review tools and approaches, particularly code review recommendation approaches. Furthermore, tool builders can use our observations and proposed methods to address two critical aspects of existing CRR approaches.