VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
Authors: Qingyuan Fei, Wenjie Hou, Xuan Hai, Xin Liu
Published: 2025-02-14 17:43:01+00:00
AI Summary
VocalCrypt is a novel active defense method against AI voice cloning that embeds imperceptible pseudo-timbre into audio, preventing voice cloning without compromising audio quality. It significantly improves robustness and real-time performance compared to existing methods.
Abstract
The rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields. While these developments have led to notable progress, they have also raised concerns about the misuse of AI VC technology, causing economic losses and negative public perceptions. To address this challenge, this study focuses on creating active defense mechanisms against AI VC systems. We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear, thereby forming systematic fragments to prevent voice cloning. This approach protects the voice without compromising its quality. In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance, achieving a 500% increase in generation speed while maintaining interference effectiveness. Unlike audio watermarking techniques, which focus on post-detection, our method offers preemptive defense, reducing implementation costs and enhancing feasibility. Extensive experiments using the Zhvoice and VCTK Corpus datasets show that our AI-cloned speech defense system performs excellently in automatic speaker verification (ASV) tests while preserving the integrity of the protected audio.