Delving into the Frequency: Temporally Consistent Human Motion Transfer in the Fourier Space

Authors: Guang Yang, Wu Liu, Xinchen Liu, Xiaoyan Gu, Juan Cao, Jintao Li

Published: 2022-09-01 05:30:23+00:00

AI Summary

This paper proposes FreMOTR, a novel framework for temporally consistent human motion transfer that operates in the Fourier space. It addresses the temporal inconsistency in existing synthetic videos by introducing frequency-based regularization modules (FAR and TFR) to improve both frame-level visual quality and temporal coherence.

Abstract

Human motion transfer refers to synthesizing photo-realistic and temporally coherent videos that enable one person to imitate the motion of others. However, current synthetic videos suffer from the temporal inconsistency in sequential frames that significantly degrades the video quality, yet is far from solved by existing methods in the pixel domain. Recently, some works on DeepFake detection try to distinguish the natural and synthetic images in the frequency domain because of the frequency insufficiency of image synthesizing methods. Nonetheless, there is no work to study the temporal inconsistency of synthetic videos from the aspects of the frequency-domain gap between natural and synthetic videos. In this paper, we propose to delve into the frequency space for temporally consistent human motion transfer. First of all, we make the first comprehensive analysis of natural and synthetic videos in the frequency domain to reveal the frequency gap in both the spatial dimension of individual frames and the temporal dimension of the video. To close the frequency gap between the natural and synthetic videos, we propose a novel Frequency-based human MOtion TRansfer framework, named FreMOTR, which can effectively mitigate the spatial artifacts and the temporal inconsistency of the synthesized videos. FreMOTR explores two novel frequency-based regularization modules: 1) the Frequency-domain Appearance Regularization (FAR) to improve the appearance of the person in individual frames and 2) Temporal Frequency Regularization (TFR) to guarantee the temporal consistency between adjacent frames. Finally, comprehensive experiments demonstrate that the FreMOTR not only yields superior performance in temporal consistency metrics but also improves the frame-level visual quality of synthetic videos. In particular, the temporal consistency metrics are improved by nearly 30% than the state-of-the-art model.


Key findings
FreMOTR significantly improves temporal consistency metrics by nearly 30% compared to state-of-the-art models. It also enhances frame-level visual quality, reducing artifacts and noise. The qualitative analysis further supports the significant improvement in visual perception.
Approach
FreMOTR uses Fast Fourier Convolution (FFC) to analyze and regularize the frequency domain of video frames. It introduces two modules: Frequency-domain Appearance Regularization (FAR) for improving individual frame quality and Temporal Frequency Regularization (TFR) for ensuring temporal consistency between frames by minimizing the difference between natural and synthetic video frequency changes.
Datasets
UNKNOWN
Model(s)
FreMOTR framework incorporating Fast Fourier Convolution (FFC), Frequency-domain Appearance Regularization (FAR), and Temporal Frequency Regularization (TFR) modules. The paper also mentions using a human motion transfer backbone (C2F-FWN) for initial frame synthesis.
Author countries
China