MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network

Authors: Karthik Sivarama Krishnan, Koushik Sivarama Krishnan

Published: 2023-11-06 20:32:39+00:00

AI Summary

The paper introduces MFAAN, a multi-feature audio authenticity network for detecting audio deepfakes. MFAAN uses multiple parallel paths processing MFCC, LFCC, and Chroma-STFT features, achieving high accuracy on benchmark datasets.

Abstract

In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.


Key findings
MFAAN achieves state-of-the-art accuracy (98.93% on 'In-the-Wild' and 94.47% on Fake-or-Real datasets). The multi-feature approach significantly improves performance compared to a single-feature baseline CNN. The results highlight the effectiveness of a multi-faceted approach for robust audio deepfake detection.
Approach
MFAAN uses a multi-path architecture, each path processing a different audio representation (MFCC, LFCC, Chroma-STFT) via 1D convolutional neural networks. The outputs of these paths are concatenated and fed into dense layers for classification.
Datasets
'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset
Model(s)
Multi-Feature Audio Authenticity Network (MFAAN) with multiple 1D CNNs for each feature path and dense layers for fusion and classification. A baseline CNN using only MFCCs is also used for comparison.
Author countries
UNKNOWN