Imperceptible Adversarial Examples for Fake Image Detection

View on arXiv ← Back to list

Authors: Quanyu Liao, Yuezun Li, Xin Wang, Bin Kong, Bin Zhu, Siwei Lyu, Youbing Yin, Qi Song, Xi Wu

Published: 2021-06-03 06:25:04+00:00

AI Summary

This paper proposes a novel method, Key Region Attack (KRA), to generate imperceptible adversarial examples for fooling fake image detectors. KRA focuses on attacking only key pixels identified through a multi-layer semantic key region selection process, resulting in significantly lower L0 and L2 norms of adversarial perturbations compared to existing methods.

Abstract

Fooling people with highly realistic fake images generated with Deepfake or GANs brings a great social disturbance to our society. Many methods have been proposed to detect fake images, but they are vulnerable to adversarial perturbations -- intentionally designed noises that can lead to the wrong prediction. Existing methods of attacking fake image detectors usually generate adversarial perturbations to perturb almost the entire image. This is redundant and increases the perceptibility of perturbations. In this paper, we propose a novel method to disrupt the fake image detection by determining key pixels to a fake image detector and attacking only the key pixels, which results in the $L_0$ and the $L_2$ norms of adversarial perturbations much less than those of existing works. Experiments on two public datasets with three fake image detectors indicate that our proposed method achieves state-of-the-art performance in both white-box and black-box attacks.

Key findings

KRA achieves state-of-the-art performance in both white-box and black-box attacks on fake image detectors. The generated adversarial perturbations are significantly less perceptible (lower L0 and L2 norms) than those produced by existing methods. The attack successfully fools the detectors even with small, targeted modifications to the images.

Approach

KRA identifies key pixels in fake images that are crucial for detection using a multi-layer heatmap analysis. It then applies adversarial attacks only to these key pixels, minimizing the size and perceptibility of the perturbations while maintaining high attack success rates.

Datasets

FaceForensics++, CNN-Synthesis

Model(s)

ResNet50, ResNet101, Inceptionv3, Xception (as both detectors and attack targets)

Author countries

China, USA

← Previous