Muhammad Riyad Parvez
Combining Static and Dynamic Symbolic Program Analysis for Scalable Bug-finding in Application Binaries
Vijay Ganesh and Paul Ward
Manual software testing is laborious and prone to human error. Yet, it is the most popular method for quality assurance. Automating the test-case generation promises bet- ter effectiveness, especially for exposing “deep” corner-case bugs. Symbolic execution is an automated technique for program analysis that has recently become practical due to advances in constraint solvers. It stands out as an automated testing technique that has no false positives, it eventually enumerates all feasible program executions, and can prioritize executions of interest. However, “path explosion”, the fact that the number of program executions is typically at least exponential in the size of the program, hinders the adoption of symbolic execution in the real world, where program commonly reaches millions of lines of code.
In this thesis, we present a method for generating test-cases using symbolic execution which reach a given potentially buggy “target” statement. Such a potentially buggy program statement can be found by static program analysis or from crash-reports given by users and serve as input to our technique. The test-case generated by our technique serves as a proof of the bug. Generating crashes at the target statement have many applications including re-producing crashes, checking warnings generated by static program analysis tools, or analysis of source code patches in code review process.
By constantly steering the symbolic execution along the branches that are most likely to lead to the target program statement and pruning the search space that are unlikely to reach the target, we were able to detect deep bugs in real programs. To tackle exponential growth of program paths, we propose a new scheme to manage program execution paths without exhausting memory. Experiments on real-life programs demonstrate that our tool WatSym, built on selective symbolic execution engine S2E, can generate crashing inputs in feasible time and order of magnitude better than symbolic approaches (as embodied by S2E) failed.