PhD Seminar: Towards a Search Engine for Big Data using Compact Codes

Wednesday, April 24, 2019 11:00 am - 11:00 am EDT (GMT -04:00)

Candidate: Sepehr Eghbali

Title: Towards a Search Engine for Big Data using Compact Codes

Date: April 24, 2019

Time: 11:00 AM

Place: EIT 3151-3153

Supervisor(s): Tahvildari, Ladan

Abstract:

The momentum in modern machine intelligence and data analytics is towards learning from massive amounts of data. However, for the current scale of data, performing even the most common and simple computational tasks becomes non-trivial. A prominent example is the nearest neighbor search, which not only serves as a fundamental subproblem in many tasks but also has direct applications, such as in nearest neighbor classification and image retrieval. The proposed research concerns the design of algorithms and machine learning tools for faster and more accurate similarity search. Towards this end, it advocates the use of short discrete codes for representing the similarity structure of data in a compact way. Transforming high-dimensional items, such as raw images, into compact codes has both computational and storage advantages as compact codes can be stored efficiently using only a few bits per data item, and more importantly they can be compared extremely fast using bit-wise or look-up table operators. The present work follows two main research directions in the field of compact coding: 1) finding mappings that better preserve the given notion of similarity while keeping the codes as compressed as possible, and 2) building efficient data structures that support sublinear search among the compact codes.