Prompt Tuning for Audio Deepfake Detection: Computationally Efficient Test-time Domain Adaptation with Limited Target Dataset
Authors: Hideyuki Oiso, Yuto Matsunaga, Kazuya Kakizaki, Taiki Miyagawa
Published: 2024-10-13 15:07:35+00:00
Comment: Accepted at Interspeech 2024. Hideyuki Oiso and Yuto Matsunaga contributed equally
AI Summary
This paper proposes a prompt tuning method for Audio Deepfake Detection (ADD) to address critical challenges in test-time domain adaptation, including source-target domain gaps, limited target dataset sizes, and high computational costs. The method operates in a plug-in style, seamlessly integrating with state-of-the-art transformer models to enhance accuracy on target data. By introducing a small number of trainable parameters, it prevents overfitting on small datasets and maintains computational efficiency.
Abstract
We study test-time domain adaptation for audio deepfake detection (ADD), addressing three challenges: (i) source-target domain gaps, (ii) limited target dataset size, and (iii) high computational costs. We propose an ADD method using prompt tuning in a plug-in style. It bridges domain gaps by integrating it seamlessly with state-of-the-art transformer models and/or with other fine-tuning methods, boosting their performance on target data (challenge (i)). In addition, our method can fit small target datasets because it does not require a large number of extra parameters (challenge (ii)). This feature also contributes to computational efficiency, countering the high computational costs typically associated with large-scale pre-trained models in ADD (challenge (iii)). We conclude that prompt tuning for ADD under domain gaps presents a promising avenue for enhancing accuracy with minimal target data and negligible extra computational burden.