Embedding Space Selection for Detecting Memorization and Fingerprinting in Generative Models

Authors: Jack He, Jianxing Zhao, Andrew Bai, Cho-Jui Hsieh

Published: 2024-07-30 19:52:49+00:00

AI Summary

This research introduces a novel deepfake detection method using Vision Transformer (ViT) embeddings. It leverages the unique distribution of memorization scores across different ViT layers to fingerprint generative models, achieving a 30% accuracy improvement over baseline methods.

Abstract

In the rapidly evolving landscape of artificial intelligence, generative models such as Generative Adversarial Networks (GANs) and Diffusion Models have become cornerstone technologies, driving innovation in diverse fields from art creation to healthcare. Despite their potential, these models face the significant challenge of data memorization, which poses risks to privacy and the integrity of generated content. Among various metrics of memorization detection, our study delves into the memorization scores calculated from encoder layer embeddings, which involves measuring distances between samples in the embedding spaces. Particularly, we find that the memorization scores calculated from layer embeddings of Vision Transformers (ViTs) show an notable trend - the latter (deeper) the layer, the less the memorization measured. It has been found that the memorization scores from the early layers' embeddings are more sensitive to low-level memorization (e.g. colors and simple patterns for an image), while those from the latter layers are more sensitive to high-level memorization (e.g. semantic meaning of an image). We also observe that, for a specific model architecture, its degree of memorization on different levels of information is unique. It can be viewed as an inherent property of the architecture. Building upon this insight, we introduce a unique fingerprinting methodology. This method capitalizes on the unique distributions of the memorization score across different layers of ViTs, providing a novel approach to identifying models involved in generating deepfakes and malicious content. Our approach demonstrates a marked 30% enhancement in identification accuracy over existing baseline methods, offering a more effective tool for combating digital misinformation.


Key findings
Early ViT layers are more sensitive to low-level memorization, while later layers capture high-level memorization. The distribution of CT-scores across ViT layers serves as a unique fingerprint for identifying generative models. The proposed fingerprinting method significantly outperforms baseline methods in accuracy.
Approach
The approach analyzes layer-wise memorization scores (CT-scores) from ViT encoders. The unique distribution of these scores across layers acts as a fingerprint for identifying the generative model used to create deepfakes. This fingerprint is compared to a database of known model fingerprints for identification.
Datasets
CIFAR-10 (with various augmentations: rotated, downsampled, Gaussian segmentation, real segmentation, shuffled), BigGAN-deep generated images, DDPM and DDIM generated images.
Model(s)
Vision Transformers (ViT-base, vit-large-patch16, vit-huge-patch14), ResNet-18 (fine-tuned for segmentation), ResNet50 (fine-tuned for classification), Vanilla CNN (trained from scratch), BigGAN-deep, ContraGAN, SNGAN, DDPM, DDIM, PNDM.
Author countries
USA