Fusing Global and Local Features for Generalized AI-Synthesized Image Detection

Authors: Yan Ju, Shan Jia, Lipeng Ke, Hongfei Xue, Koki Nagano, Siwei Lyu

Published: 2022-03-26 01:55:37+00:00

AI Summary

This paper proposes a two-branch model for AI-synthesized image detection that fuses global spatial features with local features from informative patches selected by a novel patch selection module. The model improves generalization ability by combining these features using a multi-head attention mechanism, achieving high accuracy and robustness.

Abstract

With the development of the Generative Adversarial Networks (GANs) and DeepFakes, AI-synthesized images are now of such high quality that humans can hardly distinguish them from real images. It is imperative for media forensics to develop detectors to expose them accurately. Existing detection methods have shown high performance in generated images detection, but they tend to generalize poorly in the real-world scenarios, where the synthetic images are usually generated with unseen models using unknown source data. In this work, we emphasize the importance of combining information from the whole image and informative patches in improving the generalization ability of AI-synthesized image detection. Specifically, we design a two-branch model to combine global spatial information from the whole image and local informative features from multiple patches selected by a novel patch selection module. Multi-head attention mechanism is further utilized to fuse the global and local features. We collect a highly diverse dataset synthesized by 19 models with various objects and resolutions to evaluate our model. Experimental results demonstrate the high accuracy and good generalization ability of our method in detecting generated images. Our code is available at https://github.com/littlejuyan/FusingGlobalandLocal.


Key findings
The proposed method outperforms baselines in terms of mean Average Precision (mAP) and global AP on a diverse dataset of AI-synthesized images. The model demonstrates robust performance against post-processing operations like blurring and JPEG compression, highlighting its improved generalization ability.
Approach
The approach uses a two-branch model: one branch extracts global features from the entire image, and the other extracts local features from patches selected by a patch selection module. A multi-head attention mechanism fuses these features for final classification.
Datasets
A highly diverse dataset synthesized by 19 models with various objects and resolutions; training dataset comprised of 362K real images from LSUN and 362K images generated by ProGAN.
Model(s)
ResNet-50 (pre-trained with ImageNet) as the backbone for both global and local feature extraction; a custom two-branch model with a patch selection module (PSM) and an attention-based feature fusion module (AFFM).
Author countries
USA