Complex-valued neural networks for voice anti-spoofing
Authors: Nicolas M. Müller, Philip Sperl, Konstantin Böttinger
Published: 2023-08-22 21:49:38+00:00
AI Summary
This paper proposes using complex-valued neural networks to process complex-valued constant-Q transforms (CQT) of audio for voice anti-spoofing. This approach retains phase information, improving detection accuracy and enabling explainable AI methods. The results show superior performance compared to existing methods on the In-the-Wild dataset.
Abstract
Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the complex-valued, CQT frequency-domain representation of the input audio. This method retains phase information and allows for explainable AI methods. Results show that this approach outperforms previous methods on the In-the-Wild anti-spoofing dataset and enables interpretation of the results through explainable AI. Ablation studies confirm that the model has learned to use phase information to detect voice spoofing.