Master’s Thesis Presentation • Data Systems — Total Relation Recall: High-Recall Relation Extraction

Friday, April 16, 2021 2:00 pm - 2:00 pm EDT (GMT -04:00)

Please note: This master’s thesis presentation will be given online.

Xinyu Liu, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Jimmy Lin

As Knowledge Graphs (KGs) become important in a wide range of applications, including question-answering and recommender systems, more and more enterprises have recognized the value of constructing KGs with their own data. While enterprise data consists of structured and unstructured data, companies mostly focus on structured ones, which are easier to exploit than unstructured ones. However, most enterprise data are unstructured, including news, blogs, and emails, where plenty of business insights live. Therefore, companies would like to utilize unstructured data as well, and KGs are an excellent way to collect and organize information from unstructured data.

In this thesis, we introduce a novel task, Total Relation Recall (TRR), that leverages the enterprise’s unstructured documents to build KGs using high-recall relation extraction. Given a target relation and its relevant information, TRR aims to extract all instances of such relation from the given documents. We also propose a Python-based system to address the task. To evaluate the effectiveness of our system, we conduct experiments on 12 different relations with two large-scaled news article corpus. Moreover, we conduct an ablation study to investigate the impact of natural language processing (NLP) features.


To join this master’s thesis presentation on Zoom, please go to https://zoom.us/j/92678731308?pwd=U3pqbDI2RGRpYzNqdG10NWlPWmN2Zz09.