RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
Authors: Yang Xiao, Ting Dang, Rohan Kumar Das
Published: 2025-07-11 00:24:47+00:00
Comment: Submitted to APSIPA ASC 2025
AI Summary
This paper introduces RawTFNet, a lightweight CNN model designed for speech anti-spoofing, which addresses the high computational cost of existing transformer-based models. RawTFNet improves performance by separating feature processing along time and frequency dimensions to capture fine-grained details of synthetic speech. Tested on ASVspoof 2021 LA and DF datasets, RawTFNet achieves comparable performance to state-of-the-art models while significantly reducing computational resources.
Abstract
Automatic speaker verification (ASV) systems are often affected by spoofing attacks. Recent transformer-based models have improved anti-spoofing performance by learning strong feature representations. However, these models usually need high computing power. To address this, we introduce RawTFNet, a lightweight CNN model designed for audio signals. The RawTFNet separates feature processing along time and frequency dimensions, which helps to capture the fine-grained details of synthetic speech. We tested RawTFNet on the ASVspoof 2021 LA and DF evaluation datasets. The results show that RawTFNet reaches comparable performance to that of the state-of-the-art models, while also using fewer computing resources. The code and models will be made publicly available.