Leveraging Optimization for Adaptive Attacks on Image Watermarks

Authors: Nils Lukas, Abdulrahman Diaa, Lucas Fenaux, Florian Kerschbaum

Published: 2023-09-29 03:36:42+00:00

Comment: ICLR'24

AI Summary

This paper introduces a method for designing adaptive attacks against image watermarking algorithms by formulating the attack as an optimization problem. The core idea involves creating differentiable surrogate watermarking keys to optimize attack parameters. These attacks successfully evade detection by five surveyed watermarking methods for Stable Diffusion models with no perceptible image quality degradation, emphasizing the need for more robust watermarking.

Abstract

Untrustworthy users can misuse image generators to synthesize high-quality deepfakes and engage in unethical activities. Watermarking deters misuse by marking generated content with a hidden message, enabling its detection using a secret watermarking key. A core security property of watermarking is robustness, which states that an attacker can only evade detection by substantially degrading image quality. Assessing robustness requires designing an adaptive attack for the specific watermarking algorithm. When evaluating watermarking algorithms and their (adaptive) attacks, it is challenging to determine whether an adaptive attack is optimal, i.e., the best possible attack. We solve this problem by defining an objective function and then approach adaptive attacks as an optimization problem. The core idea of our adaptive attacks is to replicate secret watermarking keys locally by creating surrogate keys that are differentiable and can be used to optimize the attack's parameters. We demonstrate for Stable Diffusion models that such an attacker can break all five surveyed watermarking methods at no visible degradation in image quality. Optimizing our attacks is efficient and requires less than 1 GPU hour to reduce the detection accuracy to 6.3% or less. Our findings emphasize the need for more rigorous robustness testing against adaptive, learnable attackers.


Key findings
Adaptive, learnable attacks can successfully evade detection by all five surveyed image watermarking methods (TRW, WDM, DWT, DWT-SVD, RivaGAN) for Stable Diffusion models. These attacks achieve this with negligible visible degradation in image quality, reducing detection accuracy to 6.3% or less efficiently. Adversarial compression proved particularly effective against all methods, highlighting the insufficient robustness of current watermarking techniques against sophisticated attackers.
Approach
The authors define an objective function to treat adaptive attacks as an optimization problem, allowing for efficient parameter tuning. They overcome challenges by using a surrogate generator to create differentiable surrogate watermarking keys, even for non-differentiable watermarks. Two learnable attack strategies, Adversarial Noising and Adversarial Compression, are developed to minimize watermark detection while maximizing perceptual image quality.
Datasets
LAION-5B, LAION-2B, LAION-HD, MS-COCO-2017
Model(s)
UNKNOWN
Author countries
Canada