An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

View on arXiv ← Back to list

Authors: Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu

Published: 2022-08-20 09:23:21+00:00

AI Summary

This paper introduces a novel problem of detecting vocoder fingerprints in fake audio, aiming to identify the specific vocoder used to generate the fake audio. Experiments using eight state-of-the-art vocoders show that distinct vocoder fingerprints exist and are detectable.

Abstract

Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the fake audio also is needed. Therefore, We propose a new problem for detecting vocoder fingerprints of fake audio. Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders. We have preliminarily explored the features and model architectures. The t-SNE visualization shows that different vocoders generate distinct vocoder fingerprints.

Key findings

LFCC features showed superior performance compared to CQCC and MFCC features. The ResNet model achieved the highest detection rate for vocoder fingerprints. Different vocoders generate distinguishable fingerprints, allowing for identification of the vocoder used to create fake audio.

Approach

The authors extract Linear Frequency Cepstral Coefficients (LFCCs) as features. These features are then fed into a ResNet model to extract vocoder fingerprints, which are finally classified using a fully connected layer to identify the source vocoder.

Datasets

A custom dataset of fake audio synthesized by eight state-of-the-art vocoders (STRAIGHT, LPCNet, WaveNet, Parallel WaveGAN, HifiGAN, Multiband-MelGAN, Style-MelGAN, Griffin-Lim) using the AISHELL3 Chinese corpus. The dataset is split into training, development, and test sets.

Model(s)

ResNet, X-vector, LCNN, SE-ResNet. ResNet achieved the best performance.

Author countries

China

← Previous