PhD Defence • Empirical Software Engineering • Studying Practical Challenges of Automated Code Review Suggestions

Tuesday, September 17, 2024 2:30 pm - 5:30 pm EDT (GMT -04:00)

Please note: This PhD defence will take place in DC 1304.

Farshad Kazemi, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Shane McIntosh

Code review is a critical step in software development, focusing on systematic source code inspection. It identifies potential defects and enhances code quality, maintainability, and knowledge sharing among developers. Despite its benefits, it is time-consuming and error-prone. Therefore, tool builders have developed tools such as Code Reviewer Recommendation (CRR) approaches to streamline the process. However, when deployed in real-world scenarios, they often fail to account for various complexities, making them impractical or even harmful. This thesis aims to identify and address challenges at various stages of the code review process: validity of recommendations, quality of the recommended reviewers, and the necessity and usefulness of CRR approaches considering emerging alternative automation. We approach these challenges in three empirical studies presented in three chapters of this thesis.

First, we empirically explore the validity of the recommended reviewers by measuring the rate of stale reviewers, i.e., those who no longer contribute to the project. We observe that stale recommendations account for a considerable portion of the suggestions provided by CRR approaches, accounting for up to 33.33% of the recommendations, with a median share of 8.30% of all the recommendations. Based on our analysis, we suggest separating the reviewer contribution recency from the other factors used by the CRR objective function. The proposed filter reduces the staleness of recommendations, i.e., the Staleness Reduction Ratio (SRR) improves between 21.44%–92.39%.

While the first study assesses the validity of the recommendations, it does not measure their quality or potential unintended impacts. Therefore, we next probe the potential unintended consequences of assigning recommended reviewers. To this end, we study the impact of assigning recommended reviewers without considering the safety of the submitted changeset. We observe existing approaches tend to improve one or two quantities of interest while degrading others. We devise an enhanced approach, Risk Aware Recommender (RAR), which increases the project safety by predicting changeset bug proneness.

Given the evolving landscape of automation in code review, our final study examines whether human reviewers and, hence, recommendation tools are still beneficial to the review process. To this end, we focus on the behaviour of Review Comment Generators (RCGs), models trained to automate code review tasks, as a potential way to replace humans in the code review process. Our quantitative and qualitative study of the RCG-generated interrogative comments shows that RCG-generated and human-submitted comments differ in mood, i.e., whether the comment is declarative or interrogative. Our qualitative analysis of sampled comments demonstrates that ACR-generated interrogative comments suffer from limitations in the RCG capacity to communicate. Our observations show that neither task-specific RCGs nor LLM-based ones can fully replace humans in asking questions. Therefore, practitioners can still benefit from using code review tools.

In conclusion, our findings underscore the continued necessity of human assistance in the code review process. Thus, we advocate for the improvement of code review tools and approaches, particularly code review recommendation approaches. Furthermore, tool builders can use our observations and proposed methods to address two critical aspects of existing CRR approaches.