Ph.D. Defence - Jinqiu Yang

Tuesday, January 9, 2018 9:00 am - 9:00 am EST (GMT -05:00)

Candidate: Jinqiu Yang

Title: Improving Automated Program Repair by Leveraging Human Knowledge

Date: January 9, 2018

Time: 9:00 AM

Place: EIT 3142

Supervisor(s): Tan, Lin

Abstract:

Developers spend much of their time fixing bugs in software programs. Automated generate- and-validate program repair techniques

(G&V) are proposed to alleviate the burden of bug fixing from developers. G&V techniques locate the patches in a hypothesized space of patches and utilize readily-available test suites to validate the quality of the generated patches. Despite the promising results (G&V techniques repair 8–17.7% of the collected bugs from mature Java or C open-source projects), two primary limitations prohibit the advances of automated program repair. First, G&V may generate incorrect patches (i.e., fail to fix the bugs while passing the test-suite-based validation due to the weakness of test suites). Second, the repair capability of each G&V technique is limited by the hypothesized space of patches. Therefore, there still exist bug fixes that cannot be handled well (i.e., not in the hypothesized space) by current automated program repair techniques.

This thesis makes contributions to advance automated program repair in two phases. First, we investigate whether improving the test-suite-based validation can precisely identify incorrect patches that are generated by G&V, and whether it can help G&V generate more correct patches. The result of this investigation–Opad, which combines new fuzz-generated test cases and additional oracles (i.e., memory oracles), is proposed to identify incorrect patches and help G&V repair more bugs correctly. The evaluation of Opad shows that the improved test-suite-based validation identifies 75.2% incorrect patches from G&V techniques. With the integration of Opad, SPR, one of the most promising G&V techniques, repairs one additional bug. Furthermore, the inspection of the new automatically-generated test cases highlights that adding new test cases have great potential in identifying incorrect patches, although the potential is not fully explored due to the limitations of test oracles.

Second, this thesis makes contributions to enrich hypothesized space of patches of automated program repair, by leveraging human intelligence. The problem of weak test suites cannot be fully addressed due to the challenging oracle problem. Meanwhile, improving the hypothesized space of patches, from both quantity and quality aspects, could further advance automated program repair. Human intelligence in bug-fixing activities is noted in the forms such as commits of bug fixes, developers’ expertise, and documentation pages. Two techniques (APARE and Priv) are proposed to target two types of defects respectively: project-specific recurring bugs and vulnerability findings by static analysis.

Given a recurring bug with both failing and passing test cases, APARE automatically learns fix patterns from historical bug fixes (i.e., originally crafted by developers), utilizes spectrum-based fault localization technique to identify highly-likely faulty methods, and applies the learned fix patterns to generate patches for developers to review. The key innovation of APARE is to utilize a percentage semantic-aware matching algorithm between fix patterns and faulty locations. For the 20 recurring bugs, APARE generates 34 method fixes, 24 of which (70.6%) are correct; 83.3% (20 out of 24) are identical to the fixes generated by developers. In addition, APARE complements current repair systems by generating 20 high-quality method fixes that RSRepair and PAR cannot generate.

Priv is a multi-stage remediation system specifically designed for static analysis security testing (SAST) techniques. The prototype is built and evaluated on a commercial SAST product. The first stage of Priv is to prioritize workloads of fixing vulnerability findings based on shared likely fix locations. The likely fix locations are suggested based on a set of manually-defined rules. The rules are concluded and developed through the collaboration with two vulnerability experts. The second stage of Priv provides additional essential information for improving the efficiency of diagnosis and fixing. Priv offers two types of additional information: locating relevant findings for identifying actionable vulnerability findings, and customizing remediation pages by generating fix suggestion for vulnerability finding. The evaluation shows that Priv suggests identical fix locations to the ones suggested by developers for 50–100% of the evaluated vulnerability findings. Priv identifies 2–2170 actionable vulnerability findings for the evaluated six projects. Moreover, our manual examination confirms that Priv can generate patches of high-quality for many of the evaluated vulnerability findings.