A Noise and Edge extraction-based dual-branch method for Shallowfake and Deepfake Localization

Authors: Deepak Dagar, Dinesh Kumar Vishwakarma

Published: 2024-09-02 02:18:34+00:00

AI Summary

This paper proposes a dual-branch model for image manipulation localization that integrates handcrafted noise features with CNN features using a ConvNext module and edge supervision loss. This approach significantly improves localization accuracy, achieving an astounding 99% AUC score and outperforming state-of-the-art models.

Abstract

The trustworthiness of multimedia is being increasingly evaluated by advanced Image Manipulation Localization (IML) techniques, resulting in the emergence of the IML field. An effective manipulation model necessitates the extraction of non-semantic differential features between manipulated and legitimate sections to utilize artifacts. This requires direct comparisons between the two regions.. Current models employ either feature approaches based on handcrafted features, convolutional neural networks (CNNs), or a hybrid approach that combines both. Handcrafted feature approaches presuppose tampering in advance, hence restricting their effectiveness in handling various tampering procedures, but CNNs capture semantic information, which is insufficient for addressing manipulation artifacts. In order to address these constraints, we have developed a dual-branch model that integrates manually designed feature noise with conventional CNN features. This model employs a dual-branch strategy, where one branch integrates noise characteristics and the other branch integrates RGB features using the hierarchical ConvNext Module. In addition, the model utilizes edge supervision loss to acquire boundary manipulation information, resulting in accurate localization at the edges. Furthermore, this architecture utilizes a feature augmentation module to optimize and refine the presentation of attributes. The shallowfakes dataset (CASIA, COVERAGE, COLUMBIA, NIST16) and deepfake dataset Faceforensics++ (FF++) underwent thorough testing to demonstrate their outstanding ability to extract features and their superior performance compared to other baseline models. The AUC score achieved an astounding 99%. The model is superior in comparison and easily outperforms the existing state-of-the-art (SoTA) models.


Key findings
The proposed model achieves a 99% AUC score on shallowfake datasets and significantly outperforms state-of-the-art models on deepfake datasets (FaceForensics++). Ablation studies confirm the importance of all model components, particularly edge supervision and the noise branch.
Approach
The model uses a dual-branch architecture: one branch extracts noise inconsistencies using Bayer and SRM filters, while the other branch processes RGB features using a ConvNext module. Edge supervision loss is incorporated to improve boundary localization, and a feature augmentation module refines feature representation.
Datasets
CASIA (v1.0 and v2.0), NIST16, COLUMBIA, COVERAGE, FaceForensics++ (FF++)
Model(s)
Dual-branch model with ConvNext module, Bayer and SRM filters, feature augmentation module, and edge supervision loss.
Author countries
India, India