PhD Seminar: Improving Software Dependability through Documentation Analysis

Wednesday, November 21, 2018 11:00 am - 11:00 am EST (GMT -05:00)

Candidate: Edmund Wong

Title: Improving Software Dependability through Documentation Analysis

Date: November 21, 2018

Time: 11:00 AM

Place: EIT 3142

Supervisor(s): Tan, Lin

Abstract:

Software documentation contains critical information that describes a system’s functionality and requirements. Documentation exists in several forms, including code comments, test plans, manual pages, and user manuals. The lack of documentation in existing software systems is an issue that impacts software maintainability and programmer productivity. Since some code base contains a large amount of documentation, we want to leverage these existing documentation to improve software dependability. Specifically, we improve both a system’s reliability (e.g., failure-free operation) and maintainability (e.g., ease of understanding) using documentation.

In this thesis, we analyze software documentation and propose two branches of work, which focuses on three types of documentation including manual pages, code comments, and user manuals. The first branch of work focuses on documentation analysis because documentation contains valuable information that describes the behavior of the program. We automatically extract constraints from documentation and apply them on a dynamic analysis symbolic execution tool, and we extract constraints manually from documentation and apply them on a structured-file parsing application. The second branch of work focuses on code comment generation because documentation can be scarce and outdated in practice.

For documentation analysis, we propose and implement DASE and DocRepair. DASE leverages automatically extracted constraints from documentation to improve a dynamic analysis symbolic execution tool. DASE guides symbolic execution to focus the testing on execution paths that execute a program’s core functionalities using constraints learned from the documentation. We evaluated DASE on 88 programs from five mature real-world software suites to detect software bugs. DASE detects 12 previously unknown bugs that symbolic execution would fail to detect when given no input constraints, 6 of which have been confirmed by the developers.

DocRepair performs an empirical study to study and repair corrupted PDF files. We create the first dataset of 319 corrupted PDF files and conduct an empirical study on 119 real-world corrupted PDF files to study the common types of file corruption.

DocRepair’s repair algorithm includes seven repair operators that utilizes manually extracted constraints from documentation to repair corrupted files. We evaluate DocRepair against three common PDF repair tools. Amongst the 1,827 collected corrupted files from over two corpora of PDF files, DocRepair can successfully repair 354 files compared to Mutool, PDFtk, and GhostScript which repair 508, 41 and 84 respectively. We also propose a technique to combine multiple repair tools called DocRepair+, which can successfully repair 751 files.