DefakeHop++: An Enhanced Lightweight Deepfake Detector

Authors: Hong-Shuo Chen, Shuowen Hu, Suya You, C. -C. Jay Kuo

Published: 2022-04-30 08:50:25+00:00

AI Summary

This work introduces DefakeHop++, an enhanced lightweight Deepfake detector that improves upon DefakeHop by expanding facial landmark coverage and employing a supervised Discriminant Feature Test (DFT) for feature selection. The proposed method automatically derives rich spatial and spectral features from facial regions and landmarks, using DFT to select discriminant features for classifier training. DefakeHop++ achieves superior Deepfake image detection performance in a weakly-supervised setting with a significantly smaller model size (238K parameters) compared to lightweight CNNs like MobileNet v3 (1.5M parameters).

Abstract

On the basis of DefakeHop, an enhanced lightweight Deepfake detector called DefakeHop++ is proposed in this work. The improvements lie in two areas. First, DefakeHop examines three facial regions (i.e., two eyes and mouth) while DefakeHop++ includes eight more landmarks for broader coverage. Second, for discriminant features selection, DefakeHop uses an unsupervised approach while DefakeHop++ adopts a more effective approach with supervision, called the Discriminant Feature Test (DFT). In DefakeHop++, rich spatial and spectral features are first derived from facial regions and landmarks automatically. Then, DFT is used to select a subset of discriminant features for classifier training. As compared with MobileNet v3 (a lightweight CNN model of 1.5M parameters targeting at mobile applications), DefakeHop++ has a model of 238K parameters, which is 16% of MobileNet v3. Furthermore, DefakeHop++ outperforms MobileNet v3 in Deepfake image detection performance in a weakly-supervised setting.


Key findings
DefakeHop++ achieves high Deepfake detection performance, often comparable to or exceeding state-of-the-art deep learning models, especially in cross-domain and weakly-supervised scenarios. It boasts a significantly smaller model size (238K parameters) compared to lightweight CNNs like MobileNet v3, making it suitable for mobile/edge deployment. The method also demonstrates lower training time for larger datasets while maintaining strong performance.
Approach
DefakeHop++ extracts spatial and spectral features from multiple facial regions and landmarks using a one-stage PixelHop unit followed by Spatial PCA. A supervised Discriminant Feature Test (DFT) then selects a subset of the most discriminant features. Finally, a LightGBM classifier is trained on these selected features to classify images as real or fake.
Datasets
UADFV, FaceForensics++ (FF++), Celeb-DF (v1, v2), DFDC
Model(s)
UNKNOWN
Author countries
USA