Master’s Thesis Presentation • Data Systems — Differentially Private Learning with Noisy Labels

Wednesday, May 20, 2020 4:00 pm - 4:00 pm EDT (GMT -04:00)

Please note: This master’s thesis presentation will be given online

Shubhankar Mohapatra, Master’s candidate
David R. Cheriton School of Computer Science

Supervised machine learning tasks require large labelled datasets. However, obtaining such datasets is a difficult task and often leads to noisy labels due to human errors or adversarial perturbation. Recent studies have shown multiple methods to tackle this problem in the non-private scenario, yet this remains an unsolved problem when the dataset is private. In this work, we aim to train a model on a sensitive dataset that contains noisy labels such that (i) the model has high test accuracy and (ii) the training process satisfies (ε,δ)-differential privacy. Noisy labels, as studied in our work, are generated by flipping labels in the training set, from the true source label(s) to other targets (s).

Our approach, Diffindo, constructs a differentially private stochastic gradient descent algorithm which removes suspicious points based on their noisy gradients. We show experiments on datasets across multiple domains with different class balance properties. Our results show that the proposed algorithm can remove up to 100% of the points with noisy labels in the private scenario while restoring the precision of the targeted label and testing accuracy to its no-noise counterparts.

To join this master’s thesis presentation virtually, please go to https://meetingsamer18.webex.com/meetingsamer18/j.php?MTID=mbb22338551a7fbf9066a1895fe976ec5.

Meeting number: 297 630 947
Password: M2paAZigw86 (62722944 from phones and video systems)

Join by video system
Dial 297630947@meetingsamer18.webex.com
You can also dial 173.243.2.68 and enter your meeting number.

Join by phone
+1-408-418-9388 United States Toll
Access code: 297 630 947