From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview

Authors: Yupei Li, Manuel Milling, Lucia Specia, Björn W. Schuller

Published: 2024-11-30 19:53:23+00:00

AI Summary

This paper provides the first comprehensive review of AI-generated music (AIGM) detection methods. It proposes a pathway for leveraging foundation models from audio deepfake detection to improve AIGM detection by focusing on intrinsic musical features rather than superficial techniques.

Abstract

As Artificial Intelligence (AI) technologies continue to evolve, their use in generating realistic, contextually appropriate content has expanded into various domains. Music, an art form and medium for entertainment, deeply rooted into human culture, is seeing an increased involvement of AI into its production. However, despite the effective application of AI music generation (AIGM) tools, the unregulated use of them raises concerns about potential negative impacts on the music industry, copyright and artistic integrity, underscoring the importance of effective AIGM detection. This paper provides an overview of existing AIGM detection methods. To lay a foundation to the general workings and challenges of AIGM detection, we first review general principles of AIGM, including recent advancements in deepfake audios, as well as multimodal detection techniques. We further propose a potential pathway for leveraging foundation models from audio deepfake detection to AIGM detection. Additionally, we discuss implications of these tools and propose directions for future research to address ongoing challenges in the field.


Key findings
The review reveals a scarcity of dedicated AIGM detection datasets and models. The authors highlight the need for focusing on intrinsic musical features and adapting successful audio deepfake detection techniques to address the unique challenges posed by AIGM detection. The lack of robust and explainable outcomes in current detectors is also emphasized.
Approach
The paper reviews existing AIGM and audio deepfake detection methods, highlighting the challenges of AIGM detection due to the subjective nature of music. It suggests adapting successful audio deepfake detection models and focusing on intrinsic musical features for improved AIGM detection.
Datasets
FakeMusicCaps, SONICS (not yet publicly released), FMA, DALI, MAESTRO, MSD, MusicNet, MTG-Jamendo, SunoCaps, Afchar’s dataset, ASVspoof series (2015, 2019, 2021), ADD series (2022, 2023), WaveFake, FakeOrReal, FakeAVCelebV2, CVoice, MLAAD, sound8k, In-the-Wild, AI Song Contest dataset
Model(s)
Q-SVM, GMM, Logistic Regression, Decision Trees, SVM, RawNet2, ResNet, LCNN, LSTM, SeNet, AASIST, Vision Transformer, Wav2Vec2, SpecTTTra, Random Forest, various CNN and Transformer-based architectures are mentioned but not specifically detailed as models used by the authors.
Author countries
UK, Germany