Personalized Defect Prediction
Academia and industry have spent a great effort in predicting software defects. Researchers proposed many defect prediction algorithms and metrics. While previous defect prediction techniques often take the author of the code into consideration, none of these techniques build a separate prediction model for each developer. Different developers have different coding styles, commit frequencies, and experience levels that result in different defect patterns. When the defects of different developers are combined, such differences are obscured, hurting the prediction performance.
This thesis proposes two techniques to improve defect prediction performance: personalized defect prediction and confidence-based hybrid defect prediction. Personalized defect prediction builds a separate prediction model for each developer to predict software defects. Confidence-based hybrid defect prediction combines different models by picking the prediction from the model with the highest confidence. As a proof of concept, we apply the two techniques to classify defects at the file change level. We implement the state-of-the-art change classification as the baseline and compares with the personalized defect prediction approach. Confidence-based defect prediction combines these two models. We evaluate on six large and popular software projects written in C and Java-the Linux kernel, PostgreSQL, Xorg, Eclipse, Lucene and Jackrabbit.