Please note: This master’s thesis presentation will be given online
Shubhankar
Mohapatra, Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervised machine learning tasks require large labelled datasets. However, obtaining such datasets is a difficult task and often leads to noisy labels due to human errors or adversarial perturbation. Recent studies have shown multiple methods to tackle this problem in the non-private scenario, yet this remains an unsolved problem when the dataset is private. In this work, we aim to train a model on a sensitive dataset that contains noisy labels such that (i) the model has high test accuracy and (ii) the training process satisfies (ε,δ)-differential privacy. Noisy labels, as studied in our work, are generated by flipping labels in the training set, from the true source label(s) to other targets (s).
Our approach, Diffindo, constructs a differentially private stochastic gradient descent algorithm which removes suspicious points based on their noisy gradients. We show experiments on datasets across multiple domains with different class balance properties. Our results show that the proposed algorithm can remove up to 100% of the points with noisy labels in the private scenario while restoring the precision of the targeted label and testing accuracy to its no-noise counterparts.
To join this master’s thesis presentation virtually, please go to https://meetingsamer18.webex.com/meetingsamer18/j.php?MTID=mbb22338551a7fbf9066a1895fe976ec5.
Meeting
number:
297
630
947
Password:
M2paAZigw86
(62722944
from
phones
and
video
systems)
Join
by
video
system
Dial
297630947@meetingsamer18.webex.com
You
can
also
dial
173.243.2.68
and
enter
your
meeting
number.
Join
by
phone
+1-408-418-9388
United
States
Toll
Access
code:
297
630
947