AI-Powered Spearphishing Cyber Attacks: Fact or Fiction?

Authors: Matthew Kemp, Harsha Kalutarage, M. Omar Al-Kadri

Published: 2025-02-03 00:02:01+00:00

AI Summary

This paper investigates the threat of deepfake audio and video in spearphishing attacks. Experiments showed that a significant percentage of participants failed to identify AI-generated audio (66%) and video (43%) as fake, highlighting the potential for successful deepfake-based cyberattacks.

Abstract

Due to society's continuing technological advance, the capabilities of machine learning-based artificial intelligence systems continue to expand and influence a wider degree of topics. Alongside this expansion of technology, there is a growing number of individuals willing to misuse these systems to defraud and mislead others. Deepfake technology, a set of deep learning algorithms that are capable of replacing the likeness or voice of one individual with another with alarming accuracy, is one of these technologies. This paper investigates the threat posed by malicious use of this technology, particularly in the form of spearphishing attacks. It uses deepfake technology to create spearphishing-like attack scenarios and validate them against average individuals. Experimental results show that 66% of participants failed to identify AI created audio as fake while 43% failed to identify such videos as fake, confirming the growing fear of threats posed by the use of these technologies by cybercriminals.


Key findings
Participants struggled to identify deepfake audio (66% failure rate) and video (43% failure rate). Combining audio and video deepfakes did not significantly improve detection rates for audio, but slightly improved video detection. Prior knowledge of deepfakes correlated with improved detection accuracy.
Approach
The authors created deepfake audio and video spearphishing scenarios using tools like DeepFaceLab and Resemble AI. They then presented these deepfakes, along with genuine examples, to participants in a survey to assess their ability to detect the fakes.
Datasets
VidTIMIT audio-video dataset (43 individuals, 10 sentences each).
Model(s)
DeepFaceLab (video), Resemble AI (audio), Speech-Driven Animation (lip-synchronization). The paper focuses on the application of these tools rather than the underlying models themselves.
Author countries
United Kingdom