Evaluating Fake Music Detection Performance Under Audio Augmentations

View on arXiv ← Back to list

Authors: Tomasz Sroka, Tomasz Wężowicz, Dominik Sidorczuk, Mateusz Modrzejewski

Published: 2025-07-07 16:15:02+00:00

AI Summary

This paper investigates the robustness of a state-of-the-art fake music detection model (SONICS) against various audio augmentations. A dataset of real and synthetic music from multiple generators was created and subjected to augmentations; the results show a significant decrease in the model's accuracy even with minor transformations.

Abstract

With the rapid advancement of generative audio models, distinguishing between human-composed and generated music is becoming increasingly challenging. As a response, models for detecting fake music have been proposed. In this work, we explore the robustness of such systems under audio augmentations. To evaluate model generalization, we constructed a dataset consisting of both real and synthetic music generated using several systems. We then apply a range of audio transformations and analyze how they affect classification accuracy. We test the performance of a recent state-of-the-art musical deepfake detection model in the presence of audio augmentations. The performance of the model decreases significantly even with the introduction of light augmentations.

Key findings

The SONICS model showed poor generalization to unseen music generators. Even light audio augmentations significantly reduced the model's accuracy, highlighting its vulnerability to distribution shifts and suggesting the need for more robust training methods incorporating diverse augmentations.

Approach

The authors evaluated the robustness of the SONICS model by applying a range of audio augmentations (e.g., pitch shifting, noise addition, filtering) to a dataset of real and synthetic music. They then analyzed the impact of these augmentations on the model's classification accuracy.

Datasets

A dataset containing real music and synthetic music generated using Suno, Udio, YuE, and MusicGen models. The dataset included 20 songs from each generator.

Model(s)

SONICS (SpecTTTra-α configuration)

Author countries

Poland

← Previous