Please note: This master’s thesis presentation will be given online.
Chengcheng Hu, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Jimmy Lin
The emergence of BERT in 2018 has brought a huge boon to retrieval effectiveness in many tasks across various domains and led the recent research landscape of IR to transformer-related technologies. While researchers are fascinated by the power of BERT, along with related transformer models, substantial computational costs incurred by transformers become an unavoidable problem. Meanwhile, under the light of BERT, there are “out-of-date” but fairly effective techniques forgotten by people. For example, learning to rank was one of the most popular technologies a decade ago.
In this work, we aim to answer two research questions: RQ1 is whether using learning to rank as a filtering stage in a multi-stage reranking pipeline can improve the efficiency of reranking using transformers without sacrificing effectiveness. In addition, we are interested in if using transformer-based features in the traditional learning to rank framework can increase effectiveness as RQ2.
To answer RQ1, we implement a multi-stage reranking pipeline which places learning to rank as a filter in the middle stage. This configuration allows the pipeline to only send the most promising candidates using cheap learning to rank module to expensive neural rerankers, hence a speedup in overall latency for transformer-based reranking can be obtained without a degradation in effectiveness. By applying the pipeline on MS MARCO passage and document ranking tasks, we can achieve up to 18× increase in efficiency while maintaining the same level of effectiveness. Moreover, our method is orthogonal to other techniques that focus on neural models themselves to accelerate inference. Hence, our method can be combined with other accelerating works to further save computational costs and latency.
For RQ2, since transformers generate relevance scores for different query-document pairs independently, it is possible to use transformer-based scores as learning to rank features, so that learning to rank can take advantage of transformers to increase retrieval effectiveness. Applied to the MS MARCO passage and document ranking tasks, we gain a maximal 52% increase in effectiveness by adding the BERT-based feature compared to the “traditional” learning to rank. Also, we obtain a result with a little bit higher effectiveness by adding transformer-based features with other traditional features in learning to rank, compared to the standard retrieve-and-rerank design with transformers.
This work explores potential roles of learning to rank in the age of muppets. In a broader sense, this work illustrates that we should stand on the shoulders of giants, which is what we learned and discovered in history, to explore next unknowns.