ECE 750 Topic 5 - Spring 2014

ECE 750 Topic 5 - Distributed and Network-Centric Computing

Instructor

Dr. Wojciech Golab

Contact Information

Lectures: Tuesdays 2:30pm to 5:30pm in room EIT-3141
Office hours: To be announced
Office number: DC-2528
email

Course Description

This graduate course will cover advanced concepts and techniques in distributed systems with applications to cloud computing and big data analytics. Course material will focus mainly on scalable performance and fault tolerance—technically challenging problems of interest to both researchers and practitioners. The course will combine concepts from distributed computing, database systems, programming languages, and theory by exploring both the foundations and practical applications of distributed storage, cluster computing frameworks, and distributed transaction processing.

Objectives

Upon successful completion of the course, students will be able to:

  1. describe the role of big data tools in solving modern social and business problems
  2. reason about trade-offs among latency, consistency, availability, and partition tolerance in distributed storage systems
  3. design and develop solutions to big data problems using open source tools (e.g., Hadoop, Cassandra)
  4. critique distributed transactional database systems with respect to performance and fault-tolerance
  5. judge the benefits and costs of distributed software systems versus centralized solutions

Content

Course material will constitute instructor-provided notes and research publications. Representative publications and software tools are listed below for the three main topics covered in the course.

  1. distributed storage: distributed file systems (GFS [8], HDFS [23]), key-value storage (Dynamo [7], BigTable [4], Cassandra [16]), state machine replication (Paxos [17], Mencius [19], ZooKeeper [11]), geo-replicated storage (MDDC [15], Pileus [25]), consistency (linearizability [10], CAP and PACELC [1], measurement [21, 28])
  2. cluster computing frameworks: BSP model [26], MapReduce [6], Hadoop, Pig [20], Hadoop workload characterization and scheduling [22, 27], dataflow computation (HaLoop [3], Spark [30], Dryad [29]), large-scale graph processing (Pregel [18], Giraph)
  3. distributed transaction processing: Paxos commit [9], scale-out databases (H-Store/VoltDB [12, 13]), globally-distributed databases (Google Spanner [5], Pileus [25]), transactional middleware (Sinfonia [2]), distributed indexing (Minuet [24])

Evaluation

Homework assignments: 30%
Paper presentations: 10%
Research project: 30%
Final exam: 30%

Note: Homework assignments will include a combination of paper critiques, programming tasks, and theory questions. Programming tasks will use open source tools such as Cassandra, HBase, Hadoop, Pig, Giraph, and ZooKeeper. You may find the following article useful for preparing paper critiques: [14].

Research Project

Depending on class size, the course project may be completed either individually or in small groups. Several types of projects are possible, including but not limited to the following:

  • Literature survey: Choose a problem related to performance or fault tolerance in distributed systems, and survey recent literature on this topic. The survey will organize known ideas, discuss their strengths and weaknesses, and identify open research problems.
  • Data analytics project: Find a large data set and analyze it using the open-source tools discussed in the course. Explore alternative solutions and analyze their relative merits, for example with respect to scalability and efficiency.
  • Systems building project: Use the techniques and tools discussed in the course to construct a novel distributed software system, or modify an existing open-source system by adding a feature or optimization. Evaluate the prototype with respect to performance or fault-tolerance.

Research project deliverables will include a proposal (5%), final report (20%), and project presentation (5%). The project report will be a conference-style manuscript, approximately ten pages in length (not including title page or references) in double-column 11pt format. All project deliverables will be evaluated on the basis of technical depth, originality of ideas, and quality of presentation.

AccessAbility

The University of Waterloo has a longstanding commitment to support the participation and access to university programs, services, and facilities by persons with disabilities. For more information, please contact AccessAbility Services, located in NH-1132.

General Policies

Late assignments and project deliverables will be accepted with a 50% penalty if submitted within 24 hours of the deadline, and will not be accepted at all after 24 hours. All students are expected to comply with all University of Waterloo policies. In particular, to maintain a culture of academic integrity, all members of the University of Waterloo community are expected to promote honesty, trust, fairness, respect and responsibility. A student is expected to know what constitutes academic integrity, and how to avoid committing an academic offence. Academic offences include cheating on exams as well as plagiarism (e.g., inappropriate copying of code or documentation) in homework, lab or project work. For further information on academic offences and penalties, grievances, and appeals please refer to Policies 70, 71, and 72 of the University of Waterloo.

References

[1] D. J. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 45(2):37–42, 2012.
[2] M. K. Aguilera, A. Merchant, M. A. Shah, A. C. Veitch, and C. T. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst., 27(3), 2009.
[3] Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient iterative data processing on large clusters. PVLDB, 3(1):285–296, 2010.
[4] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. In Proc. of OSDI, pages 205–218, 2006.
[5] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. C. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst., 31(3), 2013.
[6] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. of OSDI, pages 137–150, 2004.
[7] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon’s highly available key-value store. In Proc. of SOSP, pages 205–220, 2007.
[8] S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proc. of SOSP, pages 29–43, 2003.
[9] J. Gray and L. Lamport. Consensus on transaction commit. ACM Trans. Database Syst., 31(1):133– 160, 2006.
[10] M. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463–492, 1990.
[11] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proc. of USENIX ATC, 2010.
[12] E. P. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. In Proc. of SIGMOD, pages 603–614, 2010.
[13] R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496–1499, 2008.
[14] S. Keshav. How to read a paper. SIGCOMM Comput. Commun. Rev., 37(3):83–84, 2007.
[15] T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. MDCC: Multi-data center consistency. In Proc. of EuroSys, pages 113–126, 2013.
[16] A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35–40, 2010.
[17] L. Lamport. Paxos made simple. ACM SIGACT News (Distributed Computing Column), 32(4):51–58, 2001.
[18] G.Malewicz,M.H.Austern,A.J.C.Bik,J.C.Dehnert,I.Horn,N.Leiser,andG.Czajkowski.Pregel: a system for large-scale graph processing. In Proc. of SIGMOD, pages 135–146, 2010.
[19] Y. Mao, F. P. Junqueira, and K. Marzullo. Mencius: Building efficient replicated state machines for WANs. In Proc. of OSDI, pages 369–384, 2008.
[20] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. of SIGMOD, pages 1099–1110, 2008.
[21] M. R. Rahman, W. Golab, A. AuYoung, K. Keeton, and J. J. Wylie. Toward a principled framework for benchmarking consistency. In Proc. HotDep, 2012.
[22] Z. Ren, X. Xu, J. Wan, W. Shi, and M. Zhou. Workload characterization on a production Hadoop cluster: A case study on Taobao. In Proc. of IISWC, pages 3–13, 2012.
[23] K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In Proc. of MSST, pages 1–10, 2010.
[24] B. Sowell, W. M. Golab, and M. A. Shah. Minuet: A scalable distributed multiversion b-tree. PVLDB, 5(9):884–895, 2012.
[25] D. B. Terry, V. Prabhakaran, R. Kotla, M. Balakrishnan, M. K. Aguilera, and H. Abu-Libdeh. Consistency-based service level agreements for cloud storage. In Proc. of SOSP, pages 309–324, 2013.
[26] L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, 1990.
[27] A. Verma, L. Cherkasova, and R. H. Campbell. ARIA: Automatic resource inference and allocation for mapreduce environments. In Proc. ICAC, pages 235–244, 2011.
[28] H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu. Data consistency properties and the trade-offs in commercial cloud storage: the consumers’ perspective. In Proc. of CIDR, pages 134–143, 2011.
[29] Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ:A system for general-purpose distributed data-parallel computing using a high-level language. In Proc. of OSDI, pages 1–14, 2008.
[30] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. of NSDI, 2012.