Loupe: A Generalizable and Adaptive Framework for Image Forgery Detection

View on arXiv ← Back to list

Authors: Yuchu Jiang, Jiaming Chu, Jian Zhao, Xin Zhang, Xu Yang, Lei Jin, Chi Zhang, Xuelong Li

Published: 2025-06-20 08:18:44+00:00

AI Summary

Loupe is a lightweight framework for image deepfake detection and localization that integrates a patch-aware classifier and a segmentation module with conditional queries. It achieves state-of-the-art performance by using a pseudo-label-guided test-time adaptation mechanism to enhance robustness against distribution shifts.

Abstract

The proliferation of generative models has raised serious concerns about visual content forgery. Existing deepfake detection methods primarily target either image-level classification or pixel-wise localization. While some achieve high accuracy, they often suffer from limited generalization across manipulation types or rely on complex architectures. In this paper, we propose Loupe, a lightweight yet effective framework for joint deepfake detection and localization. Loupe integrates a patch-aware classifier and a segmentation module with conditional queries, allowing simultaneous global authenticity classification and fine-grained mask prediction. To enhance robustness against distribution shifts of test set, Loupe introduces a pseudo-label-guided test-time adaptation mechanism by leveraging patch-level predictions to supervise the segmentation head. Extensive experiments on the DDL dataset demonstrate that Loupe achieves state-of-the-art performance, securing the first place in the IJCAI 2025 Deepfake Detection and Localization Challenge with an overall score of 0.846. Our results validate the effectiveness of the proposed patch-level fusion and conditional query design in improving both classification accuracy and spatial localization under diverse forgery patterns. The code is available at https://github.com/Kamichanw/Loupe.

Key findings

Loupe achieved first place in the IJCAI 2025 Deepfake Detection and Localization Challenge with an overall score of 0.846. The patch-level fusion and conditional query design significantly improved both classification accuracy and spatial localization. The test-time adaptation mechanism enhanced robustness against distribution shifts.

Approach

Loupe uses a two-stage training process. The first stage trains a patch-aware classifier for global authenticity classification. The second stage trains a segmentation module, leveraging patch-level predictions from the first stage as pseudo-labels for test-time adaptation, improving localization accuracy.

Datasets

DDL dataset

Model(s)

Perception Encoder (image encoder), patch-aware classifier (MLP), Mask2Former architecture (segmenter with conditional pixel decoder and mask decoder)

Author countries

China

← Previous