Please note: This master’s research paper presentation will take place in DC 3317.
Muhammad
Hassan,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Shane McIntosh
Reproducible Builds are software builds that generate identical outputs at each invocation. They are important for security, quality assurance and commercial verifiability.
The current approaches for detecting and localizing build unreproducibility defects typically rely on executing builds repeatedly, followed by localizing the issue to the root cause through build log processing. While undoubtedly useful, these approaches can be resource-intensive and imprecise, especially for large software projects.
In this work, we analyze the relationships that salient, domain-agnostic, and statically-computable (i.e., computable without executing builds) commit features have to their proneness to inducing build unreproducibility defects, with the goal of determining whether these could potentially serve as a more resource-efficient and precise solution for the detection and localization of unreproducibility issues, and which features share stronger associations with the unreproducibility-proneness of a commit.
We collect unreproducibility-inducing and unreproducibility-fixing commits from Debian and perform our analyses.
Our work is novel in studying the association of commit features to their proneness to inducing build unreproducibility defects. We find that detecting unreproducibility-inducing code changes statically is an interesting avenue which has the potential to serve as a resource-efficient and granular solution for detecting and localizing build unreproducibility issues. Furthermore, we present suggestions for practitioners, to reduce the risk of introducing a build unreproducibility defect in their software.