LDFaceNet: Latent Diffusion-based Network for High-Fidelity Deepfake Generation

Authors: Dwij Mehta, Aditya Mehta, Pratik Narang

Published: 2024-08-04 16:09:04+00:00

AI Summary

LDFaceNet introduces a novel facial swapping method using a guided latent diffusion model. It leverages facial segmentation and recognition modules for conditioned denoising, achieving high-fidelity and diverse results without retraining.

Abstract

Over the past decade, there has been tremendous progress in the domain of synthetic media generation. This is mainly due to the powerful methods based on generative adversarial networks (GANs). Very recently, diffusion probabilistic models, which are inspired by non-equilibrium thermodynamics, have taken the spotlight. In the realm of image generation, diffusion models (DMs) have exhibited remarkable proficiency in producing both realistic and heterogeneous imagery through their stochastic sampling procedure. This paper proposes a novel facial swapping module, termed as LDFaceNet (Latent Diffusion based Face Swapping Network), which is based on a guided latent diffusion model that utilizes facial segmentation and facial recognition modules for a conditioned denoising process. The model employs a unique loss function to offer directional guidance to the diffusion process. Notably, LDFaceNet can incorporate supplementary facial guidance for desired outcomes without any retraining. To the best of our knowledge, this represents the first application of the latent diffusion model in the face-swapping task without prior training. The results of this study demonstrate that the proposed method can generate extremely realistic and coherent images by leveraging the potential of the diffusion model for facial swapping, thereby yielding superior visual outcomes and greater diversity.


Key findings
LDFaceNet outperforms existing state-of-the-art face swapping methods in both qualitative and quantitative evaluations. Ablation studies confirm the importance of both identity and segmentation guidance modules. The method demonstrates robustness in handling occlusions and diverse facial attributes.
Approach
LDFaceNet uses a pre-trained latent diffusion model and conditions the denoising process with a facial guidance module. This module combines identity and segmentation guidance losses to ensure realistic facial attribute transfer while preserving the target's expression and background.
Datasets
CelebA dataset
Model(s)
Pre-trained Latent Diffusion Model (LDM), ResNet-50 (for ArcFace identity extraction), BiseNet (for face segmentation)
Author countries
India