Please note: This PhD seminar will take place in DC 3317.
Nils
Lukas,
PhD
candidate
David
R.
Cheriton
School
of
Computer
Science
Supervisor: Professor Florian Kerschbaum
Watermarking controls misuse of deep neural networks by secretly marking any generated output with a hidden message. Robustness is a key characteristic of watermarking, where an attacker cannot remove a watermark without also substantially degrading the model’s accuracy. In this seminar, I present a novel approach to generate a watermarking key that is learnable, which increases the watermark’s effectiveness and robustness. Then, I discuss our proposed method’s (un)detectability and robustness.
We show that our watermark substantially outperforms existing watermarks in all measured quantities and that it is robust and undetectable against attackers limited to black-box API access. However, we present attacks showing that watermarking is not robust against an attacker with access to the model’s parameters, meaning that watermarking open-source models is likely infeasible. Finally, we discuss whether watermarking can be a promising solution to controlling misuse.