Feature Extraction Matters More: Universal Deepfake Disruption through Attacking Ensemble Feature Extractors

View on arXiv ← Back to list

Authors: Long Tang, Dengpan Ye, Zhenhao Lu, Yunming Zhang, Shengshan Hu, Yue Xu, Chuanxi Chen

Published: 2023-03-01 03:08:40+00:00

AI Summary

This paper proposes FOUND, a Feature-Output ensemble UNiversal Disruptor, to combat deepfakes by attacking the feature extraction modules of deepfake models. This two-stage approach first disrupts multi-model feature extractors and then employs a gradient-ensemble algorithm for enhanced disruption.

Abstract

Adversarial example is a rising way of protecting facial privacy security from deepfake modification. To prevent massive facial images from being illegally modified by various deepfake models, it is essential to design a universal deepfake disruptor. However, existing works treat deepfake disruption as an End-to-End process, ignoring the functional difference between feature extraction and image reconstruction, which makes it difficult to generate a cross-model universal disruptor. In this work, we propose a novel Feature-Output ensemble UNiversal Disruptor (FOUND) against deepfake networks, which explores a new opinion that considers attacking feature extractors as the more critical and general task in deepfake disruption. We conduct an effective two-stage disruption process. We first disrupt multi-model feature extractors through multi-feature aggregation and individual-feature maintenance, and then develop a gradient-ensemble algorithm to enhance the disruption effect by simplifying the complex optimization problem of disrupting multiple End-to-End models. Extensive experiments demonstrate that FOUND can significantly boost the disruption effect against ensemble deepfake benchmark models. Besides, our method can fast obtain a cross-attribute, cross-image, and cross-model universal deepfake disruptor with only a few training images, surpassing state-of-the-art universal disruptors in both success rate and efficiency.

Key findings

FOUND significantly improves disruption effectiveness against ensemble deepfake models compared to state-of-the-art methods. It achieves high success rates and efficiency, generating a universal disruptor with only a few training images. Attacking feature extractors is shown to be crucial for effective deepfake disruption.

Approach

FOUND uses a two-stage process. First, it disrupts multiple deepfake models' feature extractors using multi-feature aggregation and individual-feature maintenance. Then, it employs a gradient-ensemble algorithm to enhance disruption by simplifying the complex optimization problem of attacking multiple models end-to-end.

Datasets

CelebA, LFW, FF++O

Model(s)

StarGAN, AGGAN, AttGAN, HiSD, StarGAN-v2 (for black-box testing)

Author countries

China

← Previous