Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey

Authors: Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen, Nhien-An Le-Khac

Published: 2024-11-26 22:04:49+00:00

AI Summary

This survey provides a comprehensive review of passive deepfake detection across image, video, and audio modalities, analyzing inter-modality relationships and extending the evaluation beyond accuracy to include generalization, robustness, attribution, and real-world resilience.

Abstract

In recent years, deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists style imitation, raising questions about ethical and security concerns. In this survey, we provide a comprehensive review and comparison of passive DF detection across multiple modalities, including image, video, audio, and multi-modal, to explore the inter-modality relationships between them. Beyond detection accuracy, we extend our analysis to encompass crucial performance dimensions essential for real-world deployment: generalization capabilities across novel generation techniques, robustness against adversarial manipulations and postprocessing techniques, attribution precision in identifying generation sources, and resilience under real-world operational conditions. Additionally, we analyze the advantages and limitations of existing datasets, benchmarks, and evaluation metrics for passive DF detection. Finally, we propose future research directions that address these unexplored and emerging issues in the field of passive DF detection. This survey offers researchers and practitioners a comprehensive resource for understanding the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.


Key findings
The survey reveals significant limitations in current datasets and benchmarks, highlighting the need for improved representation of real-world scenarios and addressing class imbalance. It identifies key research gaps in generalization, robustness, and attribution, emphasizing the need for more sophisticated models capable of handling diverse real-world conditions and unknown manipulation techniques.
Approach
The paper conducts a comprehensive survey, categorizing existing methods based on modality (image, video, audio, multimodal) and evaluating approaches based on their underlying concepts, methodologies, and contributions rather than solely on reported numerical results.
Datasets
FF++, DFDC, CelebDF, ForgeryNet, Wild-DF, ADD, FakeAVCeleb, ASVspoof2019, WaveFake, DF40, VoiceWukong, MLAAD, OpenForensics, DF-Platter, DeeShy, AV-Deepfake1M, LAV-DF
Model(s)
UNKNOWN (The paper surveys existing models, but doesn't propose a new one)
Author countries
Ireland, Ireland, Vietnam, Ireland