AntiDeepFake: AI for Deep Fake Speech Recognition

Authors: Enkhtogtokh Togootogtokh, Christian Klasen

Published: 2024-01-04 08:11:47+00:00

Comment: arXiv admin note: text overlap with arXiv:2308.12734 by other authors

AI Summary

This research introduces "AntiDeepFake," an AI system designed to recognize deepfake or generative AI cloned synthetic voices. The proposed technology encompasses the entire pipeline from data collection and feature extraction to model training and evaluation. It leverages feature engineering and tabular AI models to effectively classify audio as real or deepfake.

Abstract

In this research study, we propose a modern artificial intelligence (AI) approach to recognize deepfake voice, also known as generative AI cloned synthetic voice. Our proposed AI technology, called AntiDeepFake, consists of all main pipelines from data to evaluation in the whole picture. We provide experimental results and scores for all our proposed methods. The main source code for our approach is available in the provided link: https://github.com/enkhtogtokh/antideepfake repository.


Key findings
The AntiDeepFake system achieved high performance in deepfake voice recognition, with CatBoost showing the best results among the evaluated models. CatBoost attained a testing accuracy of 93.7%, along with strong precision, recall, and F1-scores. The authors also state that achieving 99.9% accuracy is possible with "well prepared training data."
Approach
The AntiDeepFake system extracts significant audio features, including melspectrogram, pitch, shimmer, and MFCCs, to transform audio data into a tabular format. It employs a custom gradient boosted Recursive Feature Elimination (RFE) for feature engineering, followed by training state-of-the-art tabular AI models (CatBoost, XGBoost, TabNet) for binary classification of real versus deepfake speech.
Datasets
Custom collected dataset, potentially leveraging LJSpeech and public speeches for real audio, and modern synthetic voice cloning AI models for generating fake audio samples.
Model(s)
CatBoost, XGBoost, TabNet
Author countries
Germany, Mongolia