LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection

Authors: Dat Nguyen, Nesryne Mejri, Inder Pal Singh, Polina Kuleshova, Marcella Astrid, Anis Kacem, Enjie Ghorbel, Djamila Aouada

Published: 2024-01-24 23:42:08+00:00

AI Summary

LAA-Net is a novel deepfake detection approach using an explicit attention mechanism within a multi-task learning framework to focus on artifact-prone regions. It also incorporates an Enhanced Feature Pyramid Network (E-FPN) to effectively utilize multi-scale features and limit redundancy, improving generalization and performance.

Abstract

This paper introduces a novel approach for high-quality deepfake detection called Localized Artifact Attention Network (LAA-Net). Existing methods for high-quality deepfake detection are mainly based on a supervised binary classifier coupled with an implicit attention mechanism. As a result, they do not generalize well to unseen manipulations. To handle this issue, two main contributions are made. First, an explicit attention mechanism within a multi-task learning framework is proposed. By combining heatmap-based and self-consistency attention strategies, LAA-Net is forced to focus on a few small artifact-prone vulnerable regions. Second, an Enhanced Feature Pyramid Network (E-FPN) is proposed as a simple and effective mechanism for spreading discriminative low-level features into the final feature output, with the advantage of limiting redundancy. Experiments performed on several benchmarks show the superiority of our approach in terms of Area Under the Curve (AUC) and Average Precision (AP). The code is available at https://github.com/10Ring/LAA-Net.


Key findings
LAA-Net outperforms state-of-the-art methods on several deepfake benchmarks in terms of AUC and AP, demonstrating superior generalization. The ablation study validates the contribution of each component, particularly the heatmap branch. E-FPN improves performance over traditional FPN by reducing feature redundancy.
Approach
LAA-Net uses a multi-task learning framework with three branches: a binary classifier, a heatmap branch, and a self-consistency branch. The heatmap and self-consistency branches focus the network on vulnerable pixels (likely to contain blending artifacts), generated via blending-based data synthesis. An E-FPN enhances feature propagation.
Datasets
FF++, Celeb-DFv2, DeepFake Detection, DeepFake Detection Challenge, Wild Deepfake
Model(s)
EfficientNet-B4 (backbone) with custom attention and E-FPN modules.
Author countries
Luxembourg