PhD Seminar • Systems and Networking — Kawkab: A Cloud-based Distributed Filesystem for Realtime Streaming Data

Friday, January 10, 2020 1:30 pm - 1:30 pm EST (GMT -05:00)

Sajjad Rizvi, PhD candidate
David R. Cheriton School of Computer Science

Capital markets across the globe produce high volumes of data at a high rate. A single stock exchange can generate millions of messages per second, which peaks to hundreds of millions of messages per second when the data sources span many financial instruments and institutions. Realtime data analytics applications use this data to extract actionable results that remain valid only for a short time window. Therefore, the large volume and high velocity of data sets a bar that is challenging to achieve while remaining cost competitive. 

In this talk, I will present the design and the challenges involved in building Kawkab, a cloud-based high-performance filesystem for streaming data. Kawkab is purpose-built for financial market applications that require high-throughput data ingestion and low-latency data reads with high scalability. Kawkab uses a set of worker nodes for data ingestion and to service the read queries. Moreover, it leverages low-cost cloud storage services, such as Amazon S3, for cost-effective data storage. Our preliminary results show that Kawkab can ingest more than 10 million messages per second per node. Moreover, the clients can read recent data with a small processing overhead.