DefakeHop++: An Enhanced Lightweight Deepfake Detector

View on arXiv ← Back to list

Authors: Hong-Shuo Chen, Shuowen Hu, Suya You, C. -C. Jay Kuo

Published: 2022-04-30 08:50:25+00:00

AI Summary

DefakeHop++, an enhanced lightweight deepfake detector, improves upon its predecessor by incorporating eight additional facial landmarks for broader coverage and utilizing a supervised Discriminant Feature Test (DFT) for more effective feature selection. This results in a significantly smaller model (238K parameters) that outperforms MobileNet v3 in deepfake image detection performance in a weakly-supervised setting.

Abstract

On the basis of DefakeHop, an enhanced lightweight Deepfake detector called DefakeHop++ is proposed in this work. The improvements lie in two areas. First, DefakeHop examines three facial regions (i.e., two eyes and mouth) while DefakeHop++ includes eight more landmarks for broader coverage. Second, for discriminant features selection, DefakeHop uses an unsupervised approach while DefakeHop++ adopts a more effective approach with supervision, called the Discriminant Feature Test (DFT). In DefakeHop++, rich spatial and spectral features are first derived from facial regions and landmarks automatically. Then, DFT is used to select a subset of discriminant features for classifier training. As compared with MobileNet v3 (a lightweight CNN model of 1.5M parameters targeting at mobile applications), DefakeHop++ has a model of 238K parameters, which is 16% of MobileNet v3. Furthermore, DefakeHop++ outperforms MobileNet v3 in Deepfake image detection performance in a weakly-supervised setting.

Key findings

DefakeHop++ achieves high AUC scores across various datasets, even outperforming larger models like MobileNet v3 in some cases, particularly when using limited training data. The model's small size (238K parameters) makes it suitable for deployment on mobile and edge devices. Landmark analysis shows that eye regions are the most discriminant features.

Approach

DefakeHop++ extracts spatial and spectral features from eleven facial regions (three larger regions and eight landmarks). A Discriminant Feature Test (DFT) selects discriminant features, and a LightGBM classifier performs the final classification. The model is lightweight, designed for mobile applications.

Datasets

UADFV, FaceForensics++, Celeb-DF v1 and v2, DFDC

Model(s)

PixelHop++, Spatial PCA, Discriminant Feature Test (DFT), LightGBM

Author countries

USA

← Previous