Please note: This master’s thesis presentation will be given online.
Venkatraman Arumugam, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Mei Nagappan
Bug localization is the process of identifying the source code files associated with a bug report. This is important because it allows developers to focus their efforts on fixing the bugs than finding the root cause of bugs in the first place. A number of different techniques have been developed for bug localization, but recent research has shown that supervised approaches using historical data are more effective than other methods. In reality, for the supervised approaches to work, these approaches need high quality and quantity of label-rich datasets. However, preparing training data for new projects and retraining the bug localization models can be highly expensive. Additionally, most of the projects do not have rich historic bug data, as pointed out by Zimmermann et al. This necessitates cross-project bug localization, which involves using data from one project to extract the transferable features to localize bugs in a new project.
In this thesis, we aim to provide a bug localization model to locate buggy source code files in a new project without retraining by leveraging the transfer learning capability of deep learning models. Deep learning models can be trained once in a label-rich dataset and transferred to a new dataset. By leveraging deep learning, we propose AdaBL and AdaBL+GL, which can be trained once and transferred to a new project. The main idea behind AdaBL is to learn the syntactic and semantic relationship between bug reports and source code separately. The syntactic patterns are transferable features that exist between cross-projects. We pair AdaBL with a graph neural network to represent the source code as a graph to improve the semantic learning capability. We also performed a detailed survey to compile the bug localization research published since 2016 to examine the experimental settings practiced and the availability of the replication package of deep learning-based bug localization research.