RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection

View on arXiv ← Back to list

Authors: Zhiyuan He, Pin-Yu Chen, Tsung-Yi Ho

Published: 2024-05-30 14:49:54+00:00

AI Summary

This paper introduces RIGID, a training-free and model-agnostic method for AI-generated image detection. RIGID leverages the observation that real images are more robust to noise perturbations than AI-generated images, comparing representation similarity before and after adding noise to classify images.

Abstract

The rapid advances in generative AI models have empowered the creation of highly realistic images with arbitrary content, raising concerns about potential misuse and harm, such as Deepfakes. Current research focuses on training detectors using large datasets of generated images. However, these training-based solutions are often computationally expensive and show limited generalization to unseen generated images. In this paper, we propose a training-free method to distinguish between real and AI-generated images. We first observe that real images are more robust to tiny noise perturbations than AI-generated images in the representation space of vision foundation models. Based on this observation, we propose RIGID, a training-free and model-agnostic method for robust AI-generated image detection. RIGID is a simple yet effective approach that identifies whether an image is AI-generated by comparing the representation similarity between the original and the noise-perturbed counterpart. Our evaluation on a diverse set of AI-generated images and benchmarks shows that RIGID significantly outperforms existing trainingbased and training-free detectors. In particular, the average performance of RIGID exceeds the current best training-free method by more than 25%. Importantly, RIGID exhibits strong generalization across different image generation methods and robustness to image corruptions.

Key findings

RIGID significantly outperforms existing training-based and training-free methods, achieving over 25% higher average precision than the state-of-the-art training-free method. It demonstrates strong generalization across various generative models and robustness to common image corruptions.

Approach

RIGID adds small noise perturbations to an image and feeds both the original and perturbed versions to a pre-trained feature extractor (like DINOv2). It then calculates the cosine similarity between the resulting embeddings; a low similarity suggests an AI-generated image.

Datasets

ImageNet, LSUN-Bedroom, GenImage (images from Stable Diffusion 1.4 & 1.5, Midjourney, and Wukong)

Model(s)

ResNet50, CLIP, DINOv2, SAM

Author countries

Hong Kong, USA

← Previous