RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
Authors: Yang Xiao, Ting Dang, Rohan Kumar Das
Published: 2025-07-11 00:24:47+00:00
AI Summary
RawTFNet is a lightweight CNN architecture for speech anti-spoofing that achieves state-of-the-art performance while using fewer computing resources. It separates feature processing along time and frequency dimensions to capture fine-grained details of synthetic speech, showing comparable performance to heavier models on ASVspoof 2021 datasets.
Abstract
Automatic speaker verification (ASV) systems are often affected by spoofing attacks. Recent transformer-based models have improved anti-spoofing performance by learning strong feature representations. However, these models usually need high computing power. To address this, we introduce RawTFNet, a lightweight CNN model designed for audio signals. The RawTFNet separates feature processing along time and frequency dimensions, which helps to capture the fine-grained details of synthetic speech. We tested RawTFNet on the ASVspoof 2021 LA and DF evaluation datasets. The results show that RawTFNet reaches comparable performance to that of the state-of-the-art models, while also using fewer computing resources. The code and models will be made publicly available.