Candidate: Edmund Wong
Title: Improving Software Dependability through Documentation Analysis
Date: November 27, 2018
Time: 12:30 PM
Place: DC 1331
Supervisor(s): Tan, Lin
Software documentation contains critical information that describes a system's functionality and requirements. Documentation exists in several forms, including code comment, test plan, man page, and user manual. The lack of documentation in existing software systems is an issue that impacts software maintainability and programmer productivity. Since some code base contains a large amount of documentation, we want to leverage these existing documentation to improve software dependability. Specifically, we improve both a system's reliability (e.g., failure-free operation) and maintainability (e.g., ease of understanding) using documentation.
In this thesis, we analyze software documentation and propose two branches of work, which focuses on three types of documentation including man page, code comment, and user manual. The first branch of work focuses on documentation analysis because documentation contains valuable information that describes the behavior of the program. We study the constraints from documentation and apply them on a structured-file parsing application, and extract constraints automatically from documentation and apply them on a dynamic analysis symbolic execution tool. The second branch of work focuses on code comment generation because documentation can be scarce and outdated in practice.
For documentation analysis, we propose and implement DocRepair and DASE. DocRepair performs a case study to study and repair corrupted PDF files. We create the first dataset of 319 corrupted PDF files and conduct an empirical study on 119 real-world corrupted PDF files to study the common types of file corruption. DocRepair's repair algorithm includes seven repair operators that utilizes manually extracted constraints from documentation to repair corrupted files. We evaluate DocRepair against three common PDF repair tools. Amongst the 1,827 collected corrupted files from over two corpora of PDF files, DocRepair can successfully repair 354 files compared to Mutool, PDFtk, and GhostScript which repair 508, 41 and 84 respectively. We also propose a technique to combine multiple repair tools called DocRepair+, which can successfully repair 751 files.
DASE leverages automatically extracted constraints from documentation to improve a dynamic analysis symbolic execution tool. DASE guides symbolic execution to focus the testing on execution paths that execute a program's core functionalities using constraints learned from the documentation. We evaluated DASE on 88 programs from five mature real-world software suites to detect software bugs. DASE detects 12 previously unknown bugs that symbolic execution would fail to detect when given no input constraints, 6 of which have been confirmed by the developers.
For automated documentation generation, we propose and implement CloCom and AutoComment. We implement CloCom to generate code comments by mining existing software repositories in GitHub. We implement AutoComment to generate code comments by mining a Question and Answer site, Stack Overflow. CloCom and AutoComment generate 181 comments and 144 comments respectively for 15 Java projects.
200 University Avenue West
Waterloo, ON N2L 3G1