Nikita
Volodin,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Mobile and Web are two of the rapidly growing platforms. Both of these are used by a wide range of people for a wide range of tasks — from end users doing various things like gaming, movies, online banking, to businesses doing things like online communication and negotiation. It is frequently the case that one complements the other and when there is no mobile application to perform the task, there would be a Web application to do the same task. In order to bridge the gap in differences between Web and Mobile platforms, developers of both platforms created tools allowing applications built for the Web to be run on Mobile natively or very close to native. The first generation of such applications for Mobile are called “hybrid” applications, and subsequent generations are called “progressive” Web applications and “truly native” Web applications. These applications are using various Web frameworks, which allow compilation of existing Web applications into Mobile applications without the need for the developer to learn a new Mobile platform.
While the popularity of these platforms is ever growing, the research dedicated to the security of these platforms is not comparable between the two platforms. For example, Android uses a very strongly typed development language — Java. The fact that this is a strongly typed development language allows for easier static analyses performed by various tools. For example, the software engineering community has produced a great variety of work related to the search of code clones in the Java code, detection of vulnerabilities, and others. Additionally, Android has seen security and privacy related research analysing various aspects and attack vectors of the platform. There are also papers proposing improvements and fixes in order to mitigate various issues. Additionally, there is work related to the mitigation and protection against intrusive Android components, such as advertisement related libraries. As for the Web platform, there is some vulnerability related research, but not as much research related to automatic search of vulnerabilities, or research applying static analysis techniques to detect problems with the source code. For example, there is a wide range of papers analysing problematic components and finding vulnerabilities in browsers and server platforms, including suggestions to mitigate these problems. Most of these papers are looking at a specific problem in the platform and analyse whether a specific component is problematic. However, there are not a lot of tools applying static analysis techniques to automatically find issues related to various components of the Web platform. There is also not a lot of research from the software engineering community dedicated to dynamic languages such as JavaScript, analysing issues such as detection of clones and others.
In this work, we are addressing one of the issues mentioned above. Namely, we are developing a method that can detect known to be vulnerable libraries used by hybrid applications. This search is performed by applying methods similar to the code clone detection methods for Java language to the JavaScript language. We derive a signature from the reference library file and from the unknown application file. Further, we compare those signatures to derive the similarity between files and to produce a similarity value indicating how close two files are. From this, we conclude whether the unknown file is the same as or similar to the known reference file.
We collect libraries that are known to be vulnerable from the npm repository of open source JavaScript libraries, based on vulnerability data provided by Snyk, which is a project dedicated to tracking vulnerability data. From this collection, we ended up having 698 distinct libraries with 10 686 distinct versions across all libraries. We also have access to roughly 100 000 carefully collected Android applications, from which we find that 5652 are hybrid applications. With these reference libraries and applications, we performed analysis using our approach and picked ten random applications to manually verify the performance of the approach. We find with the manual verification that we can find 70.59% of library names and 80% of library names and versions. From the global analysis, we find that 2557 (45.24%) hybrid applications from our reference library set have at least one vulnerable library.
Our results show that it is possible to create a tool conducting code clone detection for the dynamic language JavaScript. Our approach still requires some refinement and improvements for minified JavaScript files. However, it could be used as a stepping stone towards a very precise code clone detection tool based on the tokens extracted from the JavaScript source code.