Wednesday, June 9, 2010 — 12:00 PM to 1:00 PM EDT

Institute for Computer Research seminar

Peta-byte scale data warehousing at facebook

By

Dr. Ning Zhang
Software Engineer
Facebook   

Date

Wednesday, June 9, 2010     

Time

12:00 noon

Place

Davis Centre, DC 1302, University of Waterloo

Abstract

Data warehousing at Facebook faces enormous challenges due to the exponential growth curve it enjoyed in the past several years. The data size generated every day exceeds tens of terabytes and approaching a hundred terabytes a day. At the same time, the number of users that query the data also increases at a faster rate thanks to the high level query language and easy-to-use tools developed on top of the huge amount of data. These constitute one of the biggest and busiest data warehouse systems in the world. To make it more challenging, more and more use cases require realtime data analytics, where online data acquisition and online query processing are the keys.    In this talk, I will introduce the data infrastructure that enables data analytics on top of Facebook's data warehouse system. This includes the ETL tools, the Hadoop MapReduce clusters, the Hive query language, and the recent research and development efforts. I will also highlight some of the research challenges that we face.

Biography

Ning Zhang is a software engineer at Facebook (profile http://www.facebook.com/nzhang). He is currently working on Hive in the Data Infrastructure team. Before joining Facebook, he worked on storage and query processing of XML databases at Oracle. Ning Zhang got his M.Math and Ph.D. from the University of Waterloo in the areas of spatial databases and XML databases respectively. He got his B.S. from Nanjing University, China.

Location 
DC - William G. Davis Computer Research Centre
1302
200 University Avenue West

Waterloo, ON N2L 3G1
Canada

S M T W T F S
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
  1. 2014 (1)
  2. 2013 (1)
  3. 2012 (2)
  4. 2011 (8)
  5. 2010 (13)
  6. 2009 (7)
  7. 2008 (7)