AntiDeepFake: AI for Deep Fake Speech Recognition

Authors: Enkhtogtokh Togootogtokh, Christian Klasen

Published: 2024-01-04 08:11:47+00:00

AI Summary

This research presents AntiDeepFake, an AI system for deepfake speech recognition. The system uses a pipeline encompassing data collection, feature extraction, feature engineering, AI modeling (with CatBoost, XGBoost, and TabNet), and evaluation, achieving high accuracy in differentiating real and synthetic speech.

Abstract

In this research study, we propose a modern artificial intelligence (AI) approach to recognize deepfake voice, also known as generative AI cloned synthetic voice. Our proposed AI technology, called AntiDeepFake, consists of all main pipelines from data to evaluation in the whole picture. We provide experimental results and scores for all our proposed methods. The main source code for our approach is available in the provided link: https://github.com/enkhtogtokh/antideepfake repository.


Key findings
The CatBoost model achieved the highest accuracy (93.7%) on the test data. The system demonstrated high precision and recall, minimizing both false positives and false negatives. The results suggest the effectiveness of the proposed approach for deepfake audio detection.
Approach
AntiDeepFake employs a pipeline that extracts mel-spectrogram features from audio data. Feature engineering uses a custom gradient boosted recursive feature elimination approach. The system then trains and compares several state-of-the-art gradient boosted and tabular AI models (CatBoost, XGBoost, and TabNet) for classification.
Datasets
A dataset of real and deepfake audio samples, including data from sources like ljspeech and AI-generated synthetic voice clones.
Model(s)
CatBoost, XGBoost, TabNet
Author countries
Germany, Mongolia