Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

Authors: Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, Jianhua Tao, Le Xu, Ruibo Fu

Published: 2023-06-09 01:43:41+00:00

AI Summary

This paper introduces a low-rank adaptation (LoRA) method for efficient fine-tuning of the wav2vec2 model for fake audio detection. By freezing pre-trained weights and adding trainable low-rank matrices, it significantly reduces the number of trainable parameters while maintaining comparable performance to full fine-tuning.

Abstract

Self-supervised speech models are a rapidly developing research topic in fake audio detection. Many pre-trained models can serve as feature extractors, learning richer and higher-level speech features. However,when fine-tuning pre-trained models, there is often a challenge of excessively long training times and high memory consumption, and complete fine-tuning is also very expensive. To alleviate this problem, we apply low-rank adaptation(LoRA) to the wav2vec2 model, freezing the pre-trained model weights and injecting a trainable rank-decomposition matrix into each layer of the transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared with fine-tuning with Adam on the wav2vec2 model containing 317M training parameters, LoRA achieved similar performance by reducing the number of trainable parameters by 198 times.


Key findings
LoRA achieved comparable performance to full fine-tuning with a 198-fold reduction in trainable parameters. The optimal rank for the low-rank matrices was found to be 4. Applying LoRA to the query and value matrices yielded the best results. LoRA also improved training efficiency significantly.
Approach
The authors employ Low-Rank Adaptation (LoRA) to fine-tune the wav2vec2 model for fake audio detection. This involves freezing the pre-trained weights and injecting trainable low-rank decomposition matrices into the transformer architecture, drastically reducing the number of trainable parameters. The method is applied to the query, key, and value vectors in the multi-head attention layer.
Datasets
ASVspoof2019 LA dataset
Model(s)
wav2vec2 XLSR (pretrained model) with a light convolutional neural network (LCNN) as a backend classifier.
Author countries
China