Use of a Capsule Network to Detect Fake Images and Videos

Authors: Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

Published: 2019-10-28 07:01:49+00:00

AI Summary

This paper introduces a capsule network for detecting various types of image and video forgeries, including deepfakes and presentation attacks. The proposed Capsule-Forensics model uses significantly fewer parameters than traditional CNNs while achieving comparable performance and offers a novel theoretical explanation through detailed analysis and visualization.

Abstract

The revolution in computer hardware, especially in graphics processing units and tensor processing units, has enabled significant advances in computer graphics and artificial intelligence algorithms. In addition to their many beneficial applications in daily life and business, computer-generated/manipulated images and videos can be used for malicious purposes that violate security systems, privacy, and social trust. The deepfake phenomenon and its variations enable a normal user to use his or her personal computer to easily create fake videos of anybody from a short real online video. Several countermeasures have been introduced to deal with attacks using such videos. However, most of them are targeted at certain domains and are ineffective when applied to other domains or new attacks. In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning. It uses many fewer parameters than traditional convolutional neural networks with similar performance. Moreover, we explain, for the first time ever in the literature, the theory behind the application of capsule networks to the forensics problem through detailed analysis and visualization.


Key findings
Capsule-Forensics achieves performance comparable to state-of-the-art methods like XceptionNet but with significantly fewer parameters. The model effectively detects computer-manipulated images and videos, as well as presentation attacks, demonstrating its generalizability. Detailed analysis reveals that the network focuses on key facial features (eyes, nose, mouth) to identify forgeries.
Approach
The authors propose Capsule-Forensics, a capsule network that leverages a pre-trained VGG-19 network as a feature extractor. The core approach uses dynamic routing between primary and output capsules to detect forgeries, incorporating novel regularizations (noise and dropout) to improve performance. Final results are obtained by averaging scores from multiple frames (for videos) or patches (for images).
Datasets
FaceForensics++, FaceForensics++, DeepFakeTIMIT, Rahmouni et al.'s dataset for detecting fully computer-generated images, Idiap's Replay-Attack database, RAISE dataset
Model(s)
Capsule Network, VGG-19 (used as a feature extractor), XceptionNet (used as a baseline for comparison)
Author countries
Japan, UK