DB Meeting - Document Size Distribution

Wednesday, February 12, 2014 2:30 PM EST

Speaker:

Andrew Kane

Abstract:

I will present a practice talk for our LSDS-IR 2014 workshop paper on document size distribution in the context of search engines, then give a few related ideas that could be explored by interested grad students. Workshop paper synopsis: Search engines split large datasets across multiple machines using document distribution. Documents are typically distributed randomly, but we propose that documents be distributed by their size instead. This produces immediate improvements in both index size and query throughput. We show improvements to an in-memory conjunctive list intersection system using simple16 compression and either skips or bitvectors. We also expect significant performance improvements in ranking based search systems.

Location

DC - William G. Davis Computer Research Centre

Room 1331
200 University Avenue West
Waterloo, ON N2L 3G1
Canada

Map

https://www.google.ca/maps/place/Davis+Centre+Library/@43.4720375,-80.5457127,17…