Attack-Aware Deepfake Detection under Counter-Forensic Manipulations

Authors: Noor Fatima, Hasan Faraz Khan, Muzammil Behzad

Published: 2025-12-26 04:05:52+00:00

AI Summary

This work introduces an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment conditions. It achieves this by combining red-team training with randomized test-time defense in a two-stream architecture that produces both classification and weakly supervised tamper heatmaps. The method demonstrates near-perfect ranking and low calibration error even under various counter-forensic attacks.

Abstract

This work presents an attack-aware deepfake and image-forensics detector designed for robustness, well-calibrated probabilities, and transparent evidence under realistic deployment conditions. The method combines red-team training with randomized test-time defense in a two-stream architecture, where one stream encodes semantic content using a pretrained backbone and the other extracts forensic residuals, fused via a lightweight residual adapter for classification, while a shallow Feature Pyramid Network style head produces tamper heatmaps under weak supervision. Red-team training applies worst-of-K counter-forensics per batch, including JPEG realign and recompress, resampling warps, denoise-to-regrain operations, seam smoothing, small color and gamma shifts, and social-app transcodes, while test-time defense injects low-cost jitters such as resize and crop phase changes, mild gamma variation, and JPEG phase shifts with aggregated predictions. Heatmaps are guided to concentrate within face regions using face-box masks without strict pixel-level annotations. Evaluation on existing benchmarks, including standard deepfake datasets and a surveillance-style split with low light and heavy compression, reports clean and attacked performance, AUC, worst-case accuracy, reliability, abstention quality, and weak-localization scores. Results demonstrate near-perfect ranking across attacks, low calibration error, minimal abstention risk, and controlled degradation under regrain, establishing a modular, data-efficient, and practically deployable baseline for attack-aware detection with calibrated probabilities and actionable heatmaps.


Key findings
The detector achieved near-perfect ranking (AUC=1.00) and a worst-case accuracy of 0.9917 across various attacks and clean splits when a global operating point was applied. Results showed low calibration error and minimal abstention risk, even with regrain identified as the most challenging stressor. Weak localization successfully concentrated evidence within plausible face regions using coarse supervision, providing interpretable heatmaps for audit.
Approach
The approach uses a two-stream architecture: one stream encodes semantic content via a pretrained backbone, and the other extracts forensic residuals, fused by a lightweight adapter for classification. A shallow FPN-style head generates weakly supervised tamper heatmaps. Robustness is achieved through red-team training, applying worst-of-K counter-forensics per batch, and randomized test-time defense with prediction aggregation.
Datasets
DeepFakeFace (DFF), CelebA (for auxiliary checks), and a constructed surveillance-style split.
Model(s)
A two-stream architecture comprising a pretrained vision backbone for semantic content and a residual stream, fused by a lightweight adapter. A shallow Feature Pyramid Network (FPN)-style head is used for tamper heatmaps. InsightFace buffalo l is used to obtain face-region priors.
Author countries
Saudi Arabia