Towards the Detection of AI-Synthesized Human Face Images

Authors: Yuhang Lu, Touradj Ebrahimi

Published: 2024-02-13 19:37:44+00:00

AI Summary

This paper introduces a benchmark for detecting AI-synthesized human face images generated by GANs and Diffusion Models. It analyzes forgery traces in the frequency domain and shows that detectors trained with frequency representations generalize better to unseen generative models.

Abstract

Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In particular, the recent popular Diffusion Models (DMs) have shown remarkable success in image synthesis. Existing detectors struggle to generalize between synthesized images created by different generative models. In this work, a comprehensive benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs has been established to evaluate both the generalization ability and robustness of state-of-the-art detectors. Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights. The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models.


Key findings
Detectors trained on general synthetic images struggle to generalize to human face images. Training with frequency representations significantly improves generalization across different generative models. The Mandelli2022 detector showed the best robustness against image perturbations.
Approach
The authors created a benchmark dataset of face images from various generative models (GANs and Diffusion Models). They evaluated existing detectors on this benchmark, analyzing their generalization and robustness. They found that training detectors on frequency representations improved performance and generalization.
Datasets
CelebA-HQ (real images), synthetic face images generated by ProGAN, StyleGAN2, VQGAN, DDPM, DDIM, PNDM, and LDM.
Model(s)
ResNet-50, XceptionNet, EfficientNetB4, Wang2020, Grag2021, Mandelli2022, Ojha2023.
Author countries
Switzerland