PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing

Authors: You Zhang, Baotong Tian, Lin Zhang, Zhiyao Duan

Published: 2025-06-03 14:52:16+00:00

AI Summary

The paper introduces PartialEdit, a new dataset of partially edited deepfake speech created using advanced neural speech editing techniques. Experiments show that models trained on existing datasets fail to generalize to PartialEdit, highlighting the challenges posed by these new deepfakes.

Abstract

Neural speech editing enables seamless partial edits to speech utterances, allowing modifications to selected content while preserving the rest of the audio unchanged. This useful technique, however, also poses new risks of deepfakes. To encourage research on detecting such partially edited deepfake speech, we introduce PartialEdit, a deepfake speech dataset curated using advanced neural editing techniques. We explore both detection and localization tasks on PartialEdit. Our experiments reveal that models trained on the existing PartialSpoof dataset fail to detect partially edited speech generated by neural speech editing models. As recent speech editing models almost all involve neural audio codecs, we also provide insights into the artifacts the model learned on detecting these deepfakes. Further information about the PartialEdit dataset and audio samples can be found on the project page: https://yzyouzhang.com/PartialEdit/index.html.


Key findings
Models trained on existing datasets like PartialSpoof performed poorly on PartialEdit, demonstrating the need for new detection methods. Partial deepfakes with unedited segments identical to the original are harder to detect. Including codec-processed but content-unedited utterances as bona fide examples during training improves localization performance.
Approach
The authors created the PartialEdit dataset using neural speech editing models (VoiceCraft, SSR-Speech, Audiobox-Speech, Audiobox) to generate partially edited audio from the VCTK dataset. They then evaluated existing deepfake detection models on this new dataset, focusing on both detection and localization tasks.
Datasets
PartialEdit (created by the authors), PartialEdit-Codec (created by the authors), PartialSpoof, VCTK, CodecFake
Model(s)
XLSR-SLS, BAM, WavLM-Large, DistillMOS, GPT-4, WhisperX
Author countries
USA, Czechia