Please note: This master’s research paper presentation will be given online.
Harsh Bindra, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Khuzaima Daudjee
Streaming systems process large data sets in one pass while applying operations on the data. Quantiles are one such operation used in streaming systems. Quantiles can outline the cumulative distribution of a data set. In this survey, we study three recent quantile sketch algorithms designed for streaming settings: DDSketch, Moments Sketch, and KLL Sketch. Key aspects of the sketch algorithms, speed, accuracy, and mergeability are examined. Moreover, these algorithms are implemented and evaluated in Apache Flink, a popular open source streaming system. Results show that DDSketch has the most stable accuracy guarantees, particularly with long tailed data distributions. KLL Sketch performs well for distributions with many recurring elements and has the fastest query times, while Moments Sketch has the fastest merge times. Although query and merge times of DDSketch are not the best of the three algorithms, they are low enough for DDSketch to be the recommended algorithm. We also evaluate the sketches in the face of missing data and find that they have little effect on the accuracy delivered by the algorithms.
To join this master’s research paper presentation on Zoom, please go to https://us02web.zoom.us/j/5198884567.
200 University Avenue West
Waterloo, ON N2L 3G1