Please note: This PhD defence will take place online.
Partha Chakraborty, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Mei Nagappan
A significant portion of resources and developers’ time is spent fixing software bugs. Identifying the root causes of these bugs within the codebase is crucial for their resolution. Automated tools for bug localization aim to assist in this process. However, their effectiveness is often limited, leading to low adoption rates. This low adoption rate indicates the disparity between research goals and developers’ expectations, emphasizing the need for improvements in bug localization tools.
This thesis explores and addresses the challenges faced by developers and tool-builders in implementing practical bug localization tools. Our research focuses on understanding developers’ expectations and enhancing the tools’ overall effectiveness. Initially, we conduct a mixed-method empirical study to understand developers’ expectations. The study reveals that while developers are willing to use bug localization tools, they have concerns related to accuracy and potential leakage of intellectual property. We found that only 27.5% of developers are familiar with these tools. The study indicates that developers need more reliable performance, better integration, flexibility, transparency, and contextual understanding to increase adoption and effectiveness.
We also examine performance issues in bug localization tools, particularly with their base—the embedding model. We found that key factors such as pre-training strategies, data familiarity, and input sequence length in embedding techniques significantly affect performance. Our findings show that using project-specific data and pre-training methods like ELECTRA can improve model performance by 25.9%. Additionally, we explore the use of reinforcement learning (RL) in bug localization and propose an RL agent called RLocator. RLocator learns from developer feedback, making it suitable for low-data environments. We also propose BLAZE, an efficient bug localization technique for cross-project and cross-language settings. By utilizing dynamic chunking, a technique that dynamically adjusts the size of the input data to the model, and hard example learning, BLAZE achieves up to a 144% improvement in Mean Average Precision (MAP) compared to previous tools.
In conclusion, our findings highlight the shortcomings in the adaptability and efficiency of current tools. We advocate for highly adaptable cross-language, cross-project bug localizers to enhance adoption rates among developers. By leveraging our observations, curated datasets, and proposed methods, tool builders can create more user-friendly bug localization tools for software developers, inspiring a new wave of innovation in this field.