Please note: This master’s thesis presentation will be given online
Shubhankar Mohapatra, Master’s candidate
David R. Cheriton School of Computer Science
Supervised machine learning tasks require large labelled datasets. However, obtaining such datasets is a difficult task and often leads to noisy labels due to human errors or adversarial perturbation. Recent studies have shown multiple methods to tackle this problem in the non-private scenario, yet this remains an unsolved problem when the dataset is private. In this work, we aim to train a model on a sensitive dataset that contains noisy labels such that (i) the model has high test accuracy and (ii) the training process satisfies (ε,δ)-differential privacy. Noisy labels, as studied in our work, are generated by flipping labels in the training set, from the true source label(s) to other targets (s).
Our approach, Diffindo, constructs a differentially private stochastic gradient descent algorithm which removes suspicious points based on their noisy gradients. We show experiments on datasets across multiple domains with different class balance properties. Our results show that the proposed algorithm can remove up to 100% of the points with noisy labels in the private scenario while restoring the precision of the targeted label and testing accuracy to its no-noise counterparts.
To join this master’s thesis presentation virtually, please go to https://meetingsamer18.webex.com/meetingsamer18/j.php?MTID=mbb22338551a7fbf9066a1895fe976ec5.
Meeting number: 297 630 947
Password: M2paAZigw86 (62722944 from phones and video systems)
Join by video system
You can also dial 126.96.36.199 and enter your meeting number.
200 University Avenue West
Waterloo, ON N2L 3G1