ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images

View on arXiv ← Back to list

Authors: Binh M. Le, Simon S. Woo

Published: 2021-12-07 07:58:28+00:00

AI Summary

This paper proposes ADD, an Attention-based Deepfake detection Distiller, to improve deepfake detection in low-quality compressed images. ADD uses two novel distillation methods: frequency attention distillation to recover lost high-frequency components and multi-view attention distillation to transfer the teacher's distribution to the student more efficiently.

Abstract

Despite significant advancements of deep learning-based forgery detectors for distinguishing manipulated deepfake images, most detection approaches suffer from moderate to significant performance degradation with low-quality compressed deepfake images. Because of the limited information in low-quality images, detecting low-quality deepfake remains an important challenge. In this work, we apply frequency domain learning and optimal transport theory in knowledge distillation (KD) to specifically improve the detection of low-quality compressed deepfake images. We explore transfer learning capability in KD to enable a student network to learn discriminative features from low-quality images effectively. In particular, we propose the Attention-based Deepfake detection Distiller (ADD), which consists of two novel distillations: 1) frequency attention distillation that effectively retrieves the removed high-frequency components in the student network, and 2) multi-view attention distillation that creates multiple attention vectors by slicing the teacher's and student's tensors under different views to transfer the teacher tensor's distribution to the student more efficiently. Our extensive experimental results demonstrate that our approach outperforms state-of-the-art baselines in detecting low-quality compressed deepfake images.

Key findings

The proposed ADD method outperforms state-of-the-art baselines in detecting low-quality compressed deepfake images. The use of frequency and multi-view attention distillation significantly improves the student network's ability to learn discriminative features from compressed images. Results demonstrate that the approach effectively addresses the challenges posed by the loss of high-frequency and correlated information in compressed images.

Approach

The authors leverage knowledge distillation (KD) to train a student network on low-quality compressed deepfakes using a teacher network trained on high-quality images. They introduce frequency attention distillation to recover high-frequency information and multi-view attention distillation to improve the transfer of information from teacher to student.

Datasets

UNKNOWN

Model(s)

UNKNOWN (Specific model architectures are not detailed in the provided abstract)

Author countries

South Korea

← Previous