Contact Info
Department of Applied Mathematics
University of Waterloo
Waterloo, Ontario
Canada N2L 3G1
Phone: 519-888-4567, ext. 32700
Fax: 519-746-4319
PDF files require Adobe Acrobat Reader
MC 6496
Junyu Lai, Applied Math, University of Waterloo
Implementations of iterative algorithms in Hadoop and Spark
Facing the challenges of large amounts of data generated by various companies (such as Facebook, Amazon, and Twitter), cloud computing frameworks such as Hadoop are used to store and process the Big Data. Hadoop, an open source cloud computing framework, is popular because of its scalability and fault tolerance. However, by frequently writing and reading data from the Hadoop Distributed File System (HDFS), Hadoop is quite slow in many applications. Apache Spark, a new cloud computing framework developed at AMPLab of UC Berkeley, solves this problem by caching data in memory. Spark develops a new abstraction called resilient distributed dataset (RDD) which is both scalable and fault-tolerant. In this thesis, we describe the architecture of Hadoop and Spark and discuss their differences. Properties of RDDs and how they work in Spark are discussed in detail, which gives a guide on how to use them efficiently. The main contribution of the thesis is to implement the PageRank algorithm and Conjugate Gradient (CG) method in Hadoop and Spark, and show how Spark out-performs Hadoop by taking advantage of memory caching.
Contact Info
Department of Applied Mathematics
University of Waterloo
Waterloo, Ontario
Canada N2L 3G1
Phone: 519-888-4567, ext. 32700
Fax: 519-746-4319
PDF files require Adobe Acrobat Reader
The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations.