Visual Realism Assessment for Face-swap Videos

Authors: Xianyun Sun, Beibei Dong, Caiyong Wang, Bo Peng, Jing Dong

Published: 2023-02-02 07:34:27+00:00

AI Summary

This paper introduces a benchmark for visual realism assessment (VRA) of face-swap videos, a previously unexplored area. It evaluates various models, from traditional handcrafted features to deep learning models, using the DFGC 2022 dataset and demonstrates the feasibility of effective VRA models.

Abstract

Deep-learning based face-swap videos, also known as deep fakes, are becoming more and more realistic and deceiving. The malicious usage of these face-swap videos has caused wide concerns. The research community has been focusing on the automatic detection of these fake videos, but the assessment of their visual realism, as perceived by human eyes, is still an unexplored dimension. Visual realism assessment, or VRA, is essential for assessing the potential impact that may be brought by a specific face-swap video, and it is also important as a quality assessment metric to compare different face-swap methods. In this paper, we make a small step towards this new VRA direction by building a benchmark for evaluating the effectiveness of different automatic VRA models, which range from using traditional hand-crafted features to different kinds of deep-learning features. The evaluations are based on a recent competition dataset named DFGC 2022, which contains 1400 diverse face-swap videos that are annotated with Mean Opinion Scores (MOS) on visual realism. Comprehensive experiment results using 11 models and 3 protocols are shown and discussed. We demonstrate the feasibility of devising effective VRA models for assessing face-swap videos and methods. The particular usefulness of existing deepfake detection features for VRA is also noted. The code can be found at https://github.com/XianyunSun/VRA.git.


Key findings
Deepfake detection features, specifically from the DFGC-1st model, significantly outperformed handcrafted features in VRA. Method-level evaluation proved more accurate than video-level evaluation. Generalization to unseen datasets remains a challenge.
Approach
The approach uses existing handcrafted and deep-learning models to extract features from face-swap videos. These features are then fused and fed into a Support Vector Regression (SVR) model to predict the visual realism score (MOS). The process involves feature selection to improve performance.
Datasets
DFGC 2022 dataset (1400 face-swap videos with Mean Opinion Scores on visual realism)
Model(s)
BRISQUE, GM-LOG, FRIQUEE, TLVQM, V-BLIINDS, VIDEVAL, ensemble model, ResNet50, VGG-Face, DFDC-ispl, DFGC-1st (a deepfake detection model). Support Vector Regression (SVR) is used as the regressor.
Author countries
China