Master’s Thesis Presentation • Data Systems • Determining the Utility of Key-term Highlighting for High Recall Information Retrieval Systems

Tuesday, September 14, 2021 12:00 pm - 12:00 pm EDT (GMT -04:00)

Please note: This master’s thesis presentation will be given online.

Xue Jun Wang, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Maura Grossman

High-recall information retrieval (HRIR) is an important tool used in tasks such as electronic discovery (“eDiscovery”) and systematic review of medical research. Applications of HRIR often use a human as its oracle to determine the relevance of immense numbers of documents, which is expensive in both time and money. Various methods for reducing the amount of time spent per assessment and improving the quality of assessors have been proposed to improve these systems.

For this thesis, we examine the method of presenting documents where key-terms are highlighted in place of plain-text document. This is commonly accepted as a positive feature which achieves both of the previously mentioned improvements, but there is currently a lack of empirical evidence to support its effectiveness. We describe a user study in which participants are assigned to one of two variations of a HRIR system (key-term highlighting vs plain-text) with a post task questionnaire. Our results indicate that labelling documents with key-term highlighting had no statistically significant improvement over plain-text for any of the measures recall, precision, and F1, but may negatively affect retention of concepts.

Our study provides empirical evidence for how the use of key-term highlighting affects an assessor’s abilities to label documents and provides insight into when including this feature may be harmful rather than helpful.

To join this master’s thesis presentation on Zoom, please go to