Comparing Software Architecture Recovery Techniques Using Accurate Dependencies
Many techniques have been proposed to automatically recover software architectures from software implementations. A thorough comparison among the recovery techniques is needed to understand their effectiveness and applicability. This study improves on previous studies in two ways.
First, we study the impact of leveraging more accurate symbol dependencies on the accuracy of architecture recovery techniques. In addition, we evaluate other parameters of the input dependencies such as the level of granularity, and whether the dependencies are direct or transitive. Second, we study a system (Chromium) that is substantially larger (9.7 million lines of code) than those included in previous studies. Obtaining the ground-truth architecture of Chromium involved two years of collaboration with its developers. As part of this work we developed a new submodule-based technique to recover preliminary versions of ground-truth architectures.
The results of our evaluation of nine variants of architecture recovery techniques suggest that (1) in addition to architecture recovery techniques, the type of dependencies used as their inputs is another factor to consider for high recovery accuracy, and (2) more accurate recovery techniques are needed. Our results show that some of the studied architecture recovery techniques scale to the 10M lines-of-code range, whereas others do not.