Untraceable DeepFakes via Traceable Fingerprint Elimination

Authors: Jiewei Lai, Lan Zhang, Chen Tang, Pengcheng Sun, Xinming Wang, Yunhao Wang

Published: 2025-08-05 04:27:57+00:00

AI Summary

This paper introduces a novel multiplicative attack to generate untraceable deepfakes by fundamentally eliminating generative model fingerprints. The proposed black-box attack trains an adversarial model using only real data, achieving a 97.08% success rate against six advanced attribution models and nine generative models.

Abstract

Recent advancements in DeepFakes attribution technologies have significantly enhanced forensic capabilities, enabling the extraction of traces left by generative models (GMs) in images, making DeepFakes traceable back to their source GMs. Meanwhile, several attacks have attempted to evade attribution models (AMs) for exploring their limitations, calling for more robust AMs. However, existing attacks fail to eliminate GMs' traces, thus can be mitigated by defensive measures. In this paper, we identify that untraceable DeepFakes can be achieved through a multiplicative attack, which can fundamentally eliminate GMs' traces, thereby evading AMs even enhanced with defensive measures. We design a universal and black-box attack method that trains an adversarial model solely using real data, applicable for various GMs and agnostic to AMs. Experimental results demonstrate the outstanding attack capability and universal applicability of our method, achieving an average attack success rate (ASR) of 97.08% against 6 advanced AMs on DeepFakes generated by 9 GMs. Even in the presence of defensive mechanisms, our method maintains an ASR exceeding 72.39%. Our work underscores the potential challenges posed by multiplicative attacks and highlights the need for more robust AMs.


Key findings
The proposed multiplicative attack achieves a 97.08% average attack success rate against six advanced attribution models and nine generative models. Even with defensive mechanisms, the success rate remains above 72.39%. The attack is shown to be effective against both black-box and white-box defensive strategies.
Approach
The authors propose a multiplicative attack, which uses an adversarial matrix to eliminate generative model fingerprints in images. This approach employs a two-module framework: data synthesis to create synthetic data mimicking deepfake characteristics, and model construction to train an adversarial model that serves as the aforementioned matrix.
Datasets
UNKNOWN
Model(s)
Encoder-decoder architecture with convolutional and residual layers, VGG-16 for perceptual loss.
Author countries
China