ADD 2023: the Second Audio Deepfake Detection Challenge

Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

Published: 2023-05-23 07:42:52+00:00

AI Summary

The ADD 2023 challenge focuses on advancing audio deepfake detection beyond binary classification. It introduces three sub-challenges: audio fake game, manipulation region localization, and deepfake algorithm recognition, pushing research toward more realistic and nuanced detection.

Abstract

Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks.


Key findings
The challenge results show that manipulation region location and algorithm recognition are still challenging tasks. While some participants surpassed baseline models, the average performance across all submissions highlights the ongoing need for improved audio deepfake detection techniques.
Approach
The challenge uses datasets containing real and fake audio samples, some with partially manipulated regions. Participants develop models for three sub-challenges: generating fake audio to deceive detectors, detecting fake audio, and locating manipulated regions or recognizing the generation algorithm.
Datasets
AISHELL-3, AISHELL-1, Thchs30, datasets from ADD 2022, and custom datasets including partially fake audio and audio from various deepfake algorithms.
Model(s)
GMM, LCNN, ResNet, wav2vec2 (as feature extractor). The paper also mentions participants using a variety of models, but these specifics are not stated.
Author countries
China, Singapore, Hong Kong