Audio Deepfake Detection: A Survey

Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiaohui Zhang, Chu Yuan Zhang, Yan Zhao

Published: 2023-08-29 01:50:01+00:00

AI Summary

This survey paper provides a comprehensive overview of audio deepfake detection, analyzing state-of-the-art approaches, datasets, features, and classifiers. It also performs a unified comparison of these methods on various datasets and highlights challenges for future research, such as the need for larger, more diverse datasets.

Abstract

Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results.


Key findings
The survey reveals that existing methods lack generalization to unseen attacks and large-scale, diverse datasets are needed. Self-supervised features and feature concatenation show more robust performance across different datasets. The performance of deep learning classifiers degrades significantly during out-of-distribution evaluation.
Approach
The authors conduct a systematic survey of existing literature on audio deepfake detection. They categorize and analyze different types of deepfakes, datasets, features (spectral, prosodic, deep), and classifiers, comparing representative methods on ASVspoof 2021, ADD 2023, and In-the-Wild datasets.
Datasets
ASVspoof 2021, ADD 2023, In-the-Wild, FoR, WaveFake, EmoFake, SceneFake, FMFCC-A
Model(s)
GMM, SVM, LCNN, ResNet, Res2Net, SENet, ASSERT, GAT, DARTS (PC-DARTS), RawNet2, TO-RawNet, RawGAT-ST, AASIST, Orth-AASIST, Raw PC-DARTS, Rawformer, SE-Rawformer, CRNNSpoof
Author countries
China