Hello Me, Meet the Real Me: Audio Deepfake Attacks on Voice Assistants

View on arXiv ← Back to list

Authors: Domna Bilika, Nikoletta Michopoulou, Efthimios Alepis, Constantinos Patsakis

Published: 2023-02-20 21:41:14+00:00

AI Summary

This research investigates the vulnerability of voice assistants (VAs) to audio deepfake attacks. The authors demonstrate that synthesized voice commands, created using readily available tools, successfully tricked VAs into performing unauthorized actions in over 30% of their experiments, highlighting a significant security risk.

Abstract

The radical advances in telecommunications and computer science have enabled a myriad of applications and novel seamless interaction with computing interfaces. Voice Assistants (VAs) have become a norm for smartphones, and millions of VAs incorporated in smart devices are used to control these devices in the smart home context. Previous research has shown that they are prone to attacks, leading vendors to countermeasures. One of these measures is to allow only a specific individual, the device's owner, to perform possibly dangerous tasks, that is, tasks that may disclose personal information, involve monetary transactions etc. To understand the extent to which VAs provide the necessary protection to their users, we experimented with two of the most widely used VAs, which the participants trained. We then utilised voice synthesis using samples provided by participants to synthesise commands that were used to trigger the corresponding VA and perform a dangerous task. Our extensive results showed that more than 30% of our deepfake attacks were successful and that there was at least one successful attack for more than half of the participants. Moreover, they illustrate statistically significant variation among vendors and, in one case, even gender bias. The outcomes are rather alarming and require the deployment of further countermeasures to prevent exploitation, as the number of VAs in use is currently comparable to the world population.

Key findings

More than 30% of the audio deepfake attacks were successful in bypassing voice authentication on the tested VAs. Significant variations in success rates were observed between vendors (Google and Apple) and even based on gender (iOS showing gender bias). At least one successful attack was observed for over half of the participants.

Approach

The researchers used an off-the-shelf open-source voice cloning tool (RTVC) to synthesize voice commands based on samples provided by participants. These synthesized commands were then used to attempt to trick Google Assistant and Apple's Siri into performing sensitive actions, such as making phone calls.

Datasets

The dataset consisted of voice samples collected from 140 participants. Samples were collected via face-to-face recordings, recordings from another device (PC), and extracted from videos. The participants used both Android and iOS devices.

Model(s)

Real Time Voice Cloning (RTVC), a three-stage deep learning framework for voice cloning, and its components (Tacotron synthesizer, WaveRNN vocoder).

Author countries

Greece

← Previous