Master’s Thesis Presentation • Data Science • Misinformation RetrievalExport this event to calendar

Monday, September 13, 2021 — 10:00 AM EDT

Please note: This master’s thesis presentation will be given online.

Saira Rizvi, Master’s candidate
David R. Cheriton School of Computer Science | Department of Statistics and Actuarial Science

Supervisor: Professor Charles Clarke

This work introduces the task of misinformation retrieval, identifying all documents containing misinformation for a given topic, and proposes a pipeline for misinformation retrieval on tweets. As part of the work, I curated 50 COVID-19 misinformation topics used in the TREC 2020 Health Misinformation track. In addition, I annotated a test set of tweets using the TREC COVID-19 misinformation on social media. Misinformation on social media has proven highly detrimental to communities by encouraging harmful and often life-threatening behavior. The chaos caused by COVID-19 misinformation has created an urgent need for misinformation detection methods to moderate social media platforms.

Drawing upon previous work in misinformation detection and the TREC 2020 Health Misinformation Track, I focused on the task of misinformation retrieval on social media. I extended the COVID-Lies data set created to detect COVID-19 misinformation in tweets by rephrasing the misconceptions accompanying each tweet. I also created 50 COVID-19 related topics for the TREC 2020 Health Misinformation track used for evaluation purposes. I propose a natural language inference (NLI) based approach using CT-BERT to identify tweets that contradict a given fact, used to score documents utilizing the model’s classification probability. The model was trained using a combination of NLI data sets to find the best approach. Tweets were labeled for the TREC 2020 Health Misinformation Track topics to create a test set on which the best model achieves an AUC of 0.81. I conducted several experiments which show that domain adaptation significantly improved the ability to detect misinformation. A combination of a large NLI corpus, such as SNLI, and an in-domain, such as the COVID-Lies, data set achieves the best performance on our test set. The pipelines retrieved and ranked tweets based on misinformation for 7 TREC topics from the COVID-19 Twitter stream. The top 20 unique tweets were analyzed using Precision@20 to evaluate the pipeline.


To join this master’s thesis presentation on Zoom, please go to https://us06web.zoom.us/j/87871625995?pwd=MXFTd2tJdmZKVlRRamw3WmJXRVBrZz09.

Location 
Online master’s thesis presentation
200 University Avenue West

Waterloo, ON N2L 3G1
Canada
Event tags 

S M T W T F S
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
  1. 2021 (205)
    1. December (16)
    2. November (13)
    3. October (12)
    4. September (21)
    5. August (20)
    6. July (17)
    7. June (11)
    8. May (16)
    9. April (27)
    10. March (20)
    11. February (13)
    12. January (19)
  2. 2020 (217)
    1. December (18)
    2. November (12)
    3. October (7)
    4. September (21)
    5. August (28)
    6. July (14)
    7. June (18)
    8. May (16)
    9. April (20)
    10. March (16)
    11. February (25)
    12. January (22)
  3. 2019 (255)
  4. 2018 (217)
  5. 2017 (36)
  6. 2016 (21)
  7. 2015 (36)
  8. 2014 (33)
  9. 2013 (23)
  10. 2012 (4)
  11. 2011 (1)
  12. 2010 (1)
  13. 2009 (1)
  14. 2008 (1)