Watermarks offer no defense against deepfakes

New research from the University of Waterloo’s Cybersecurity and Privacy Institute demonstrates that any artificial intelligence (AI) image watermark can be removed, without the attacker needing to know the design of the watermark, or even whether an image is watermarked to begin with.

As AI-generated images and videos became more realistic, citizens and legislators are increasingly concerned about the potential impact of “deepfakes” across politics, the legal system and everyday life.

“People want a way to verify what’s real and what’s not because the damages will be huge if we can’t,” said Andre Kassis, a PhD candidate in computer science and the lead author on the research. “From political smear campaigns to non-consensual pornography, this technology could have terrible and wide-reaching consequences.”

AI companies, including OpenAI, Meta, and Google, have offered invisible encoded “watermarks” as a solution, suggesting these secret signatures can allow them to create publicly available tools that consistently and accurately distinguish between AI-generated content and real photos or videos, without revealing the nature of the watermarks.

The Waterloo team, however, has created a tool, UnMarker, which successfully destroys watermarks without needing to know the specifics of how they’ve been encoded. UnMarker is the first practical and universal tool that can remove watermarking in real-world settings. What sets UnMarker apart is that it requires no knowledge of the watermarking algorithm, no access to internal parameters, and no interaction with the detector at all. It works universally, stripping both traditional and semantic watermarks without any customization.

“While watermarking schemes are typically kept secret by AI companies, they must satisfy two essential properties: they need to be invisible to human users to preserve image quality, and they must be robust, that is, resistant to manipulation of an image like cropping or reducing resolution,” said Dr. Urs Hengartner, associate professor of the David R. Cheriton School of Computer Science at the University of Waterloo.

“These requirements constrain the possible designs for watermarks significantly. Our key insight is that to meet both criteria, watermarks must operate in the image’s spectral domain, meaning they subtly manipulate how pixel intensities vary across the image.”

Using a statistical attack, UnMarker looks for places in the image where the pixel frequency is unusual, and then distorts that frequency, making the image unrecognizable to the watermark-recognizing tool but undetectably different to the naked eye. In tests, the method worked more than 50 per cent of the time on different AI models – including Google’s SynthID and Meta’s Stable Signature – without existing knowledge of the images’ origins or watermarking methods.

“If we can figure this out, so can malicious actors,” Kassis said. “Watermarking is being promoted as this perfect solution, but we’ve shown that this technology is breakable. Deepfakes are still a huge threat. We live in an era where you can’t really trust what you see anymore.”

The research, “UnMarker: A Universal Attack on Defensive Image Watermarking,” appears in the proceedings of the 46^th IEEE Symposium on Security and Privacy.