A Controllable 3D Deepfake Generation Framework with Gaussian Splatting

Authors: Wending Liu, Siyun Liang, Huy H. Nguyen, Isao Echizen

Published: 2025-09-15 06:34:17+00:00

Journal Ref: Proc. International Joint Conference on Biometrics (IJCB), 2025

AI Summary

This paper introduces a novel 3D deepfake generation framework leveraging 3D Gaussian Splatting for realistic, identity-preserving face swapping and reenactment in a controllable 3D space. The method combines a parametric head model (FLAME) with dynamic Gaussian representations to achieve multi-view consistent rendering and precise expression control, overcoming limitations of 2D deepfake approaches. It demonstrates comparable identity preservation to state-of-the-art 2D methods while significantly excelling in multi-view rendering quality and 3D consistency, revealing new threats from 3D manipulation.

Abstract

We propose a novel 3D deepfake generation framework based on 3D Gaussian Splatting that enables realistic, identity-preserving face swapping and reenactment in a fully controllable 3D space. Compared to conventional 2D deepfake approaches that suffer from geometric inconsistencies and limited generalization to novel view, our method combines a parametric head model with dynamic Gaussian representations to support multi-view consistent rendering, precise expression control, and seamless background integration. To address editing challenges in point-based representations, we explicitly separate the head and background Gaussians and use pre-trained 2D guidance to optimize the facial region across views. We further introduce a repair module to enhance visual consistency under extreme poses and expressions. Experiments on NeRSemble and additional evaluation videos demonstrate that our method achieves comparable performance to state-of-the-art 2D approaches in identity preservation, as well as pose and expression consistency, while significantly outperforming them in multi-view rendering quality and 3D consistency. Our approach bridges the gap between 3D modeling and deepfake synthesis, enabling new directions for scene-aware, controllable, and immersive visual forgeries, revealing the threat that emerging 3D Gaussian Splatting technique could be used for manipulation attacks.


Key findings
The method achieves lower pose and expression errors and stronger 3D identity consistency compared to 2D-based baselines. It produces photorealistic results with better visual consistency under challenging conditions like exaggerated expressions and extreme poses, while maintaining competitive 2D identity preservation. The framework also supports real-time facial reenactment (14.45 FPS), highlighting significant advancements in efficiency and robustness against depth-based detection.
Approach
The framework integrates the parametric FLAME head model with 3D Gaussian Splatting to create an animatable 3D head representation, separating head and background Gaussians. It uses pre-trained 2D deepfake models (e.g., SimSwap) as multi-view supervision to optimize facial region attributes, further enhanced by a CodeFormer-based repair module for visual consistency. A 3D Gaussian Splatting-based background is reconstructed and aligned with the head model using real-scale camera poses for seamless joint rendering.
Datasets
NeRSemble dataset, author-captured monocular videos of 3 subjects with background scenes.
Model(s)
3D Gaussian Splatting, FLAME head model, SimSwap (as 2D supervision baseline), CodeFormer (for repair module).
Author countries
Japan, Germany