PhD Seminar • Data Systems • Efficient Dense Representation Learning for Information Retrieval

Wednesday, January 24, 2024 12:30 pm - 1:30 pm EST (GMT -05:00)

Please note: This PhD seminar will take place in DC 1304.

Sheng-Chieh (Jack) Lin, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

Contrastive learning is a commonly used technique to train an effective neural retrieval model; however, it requires much computation resources (i.e., multiple GPUs or TPUs).

In this talk, I will present our two previous works on how to efficiently train neural models for dense retrieval for web search. (1) In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval (TCT-ColBERT); (2) Contextualized query embeddings for conversational search (CQE). First, we present how we use ColBERT as a teacher to efficiently train a single-vector dense retrieval model, which reaches competitive performance with limited training resource. Second, we discuss how to fast adapt the fine-tuned dense retriever to conversational search without using human relevance labels. All these two works advanced state-of-the-art retrieval effectiveness upon publication and were done in free Colab with single TPUv2.