Please note: This PhD seminar will take place in DC 3317
Nils
Lukas,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Florian Kerschbaum
Watermarking controls misuse of deep neural networks by secretly marking any generated output with a hidden message. Robustness is a key characteristic of watermarking, where an attacker cannot remove a watermark without also substantially degrading the model’s accuracy. In this seminar, I present two projects to (i) defend against misuse and (ii) evaluate the robustness of existing watermarking methods.
The first project shows that it is possible to extract an identifying mark from the model that is robust against attackers who re-train models from scratch on the outputs of a marked model. However, we show that our method is not robust against an adaptive attacker who can evade detection.
The second project proposes taxonomies for the robustness of watermarking and evaluates their robustness empirically. We find that the claimed robustness of existing watermarking methods are often overstated which causes difficulty to trust watermarking in practice.