DB Meeting - Document Size DistributionExport this event to calendar

Wednesday, February 12, 2014 2:30 PM EST
Speaker: Andrew Kane
Abstract: I will present a practice talk for our LSDS-IR 2014 workshop paper on document size distribution in the context of search engines, then give a few related ideas that could be explored by interested grad students. Workshop paper synopsis: Search engines split large datasets across multiple machines using document distribution. Documents are typically distributed randomly, but we propose that documents be distributed by their size instead. This produces immediate improvements in both index size and query throughput. We show improvements to an in-memory conjunctive list intersection system using simple16 compression and either skips or bitvectors. We also expect significant performance improvements in ranking based search systems.
Location 
DC - William G. Davis Computer Research Centre
Room 1331
200 University Avenue West

Waterloo, ON N2L 3G1
Canada

S M T W T F S
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
  1. 2024 (8)
    1. May (1)
    2. April (1)
    3. March (5)
    4. February (1)
  2. 2023 (13)
    1. December (2)
    2. October (1)
    3. September (2)
    4. August (2)
    5. May (3)
    6. April (1)
    7. February (1)
    8. January (1)
  3. 2022 (6)
  4. 2021 (8)
  5. 2020 (6)
  6. 2019 (27)
  7. 2018 (26)
  8. 2017 (15)
  9. 2016 (25)
  10. 2015 (19)
  11. 2014 (34)