Introducing Explicit Gaze Constraints to Face Swapping

Authors: Ethan Wilson, Frederick Shic, Eakta Jain

Published: 2023-05-25 15:12:08+00:00

AI Summary

This paper proposes a novel loss function for face swapping that explicitly incorporates gaze prediction to improve the realism of generated faces. By leveraging a pretrained gaze estimation network, the authors enhance the accuracy of reconstructed gaze in face swaps, benefiting applications like entertainment and deepfake detection.

Abstract

Face swapping combines one face's identity with another face's non-appearance attributes (expression, head pose, lighting) to generate a synthetic face. This technology is rapidly improving, but falls flat when reconstructing some attributes, particularly gaze. Image-based loss metrics that consider the full face do not effectively capture the perceptually important, yet spatially small, eye regions. Improving gaze in face swaps can improve naturalness and realism, benefiting applications in entertainment, human computer interaction, and more. Improved gaze will also directly improve Deepfake detection efforts, serving as ideal training data for classifiers that rely on gaze for classification. We propose a novel loss function that leverages gaze prediction to inform the face swap model during training and compare against existing methods. We find all methods to significantly benefit gaze in resulting face swaps.


Key findings
All methods incorporating gaze constraints (DFL+em, DFL+Gaze, DFL+Gaze(finetuning), DFL+em+Gaze) significantly reduced gaze error compared to the baseline DFL model. The proposed gaze loss function decreased gaze error by 19.7%, and combining it with DFL's native 'eyes and mouth priority' loss decreased error by 20.32%. The improvements in gaze accuracy are particularly relevant for improving the performance of gaze-based deepfake detection systems.
Approach
The authors introduce a new loss function that incorporates gaze prediction from a pretrained network into the training of a face-swapping model. This loss function penalizes discrepancies between the predicted gaze angles of the original and reconstructed faces, particularly in the eye region, improving gaze accuracy without sacrificing visual fidelity.
Datasets
FaceForensics++ Deep Fake Detection Dataset (for source videos); CelebA dataset (for model pretraining). A custom dataset of face swaps was created from the FaceForensics++ data.
Model(s)
DeepFaceLab (DFL) with a LIAE architecture and L2CS-Net (pretrained gaze estimation network).
Author countries
USA