Deepfake Text Detection: Limitations and Opportunities

View on arXiv ← Back to list

Authors: Jiameng Pu, Zain Sarwar, Sifat Muhammad Abdullah, Abdullah Rehman, Yoonjin Kim, Parantapa Bhattacharya, Mobin Javed, Bimal Viswanath

Published: 2022-10-17 20:40:14+00:00

AI Summary

This paper evaluates the real-world applicability of existing deepfake text detection defenses. It finds that many defenses show significant performance degradation on real-world datasets compared to their originally reported performance and proposes using semantic information for improved robustness.

Abstract

Recent advances in generative models for language have enabled the creation of convincing synthetic text or deepfake text. Prior work has demonstrated the potential for misuse of deepfake text to mislead content consumers. Therefore, deepfake text detection, the task of discriminating between human and machine-generated text, is becoming increasingly critical. Several defenses have been proposed for deepfake text detection. However, we lack a thorough understanding of their real-world applicability. In this paper, we collect deepfake text from 4 online services powered by Transformer-based tools to evaluate the generalization ability of the defenses on content in the wild. We develop several low-cost adversarial attacks, and investigate the robustness of existing defenses against an adaptive attacker. We find that many defenses show significant degradation in performance under our evaluation scenarios compared to their original claimed performance. Our evaluation shows that tapping into the semantic information in the text content is a promising approach for improving the robustness and generalization performance of deepfake text detection schemes.

Key findings

Open-domain defenses perform poorly on real-world data, while domain-specific defenses generalize better. Changing the text generation process (decoding strategy, priming) is an effective low-cost attack. Using semantic features (e.g., entity-based features in FAST) improves robustness and generalization.

Approach

The researchers evaluated six state-of-the-art deepfake text detection methods on four newly collected real-world datasets and developed several low-cost adversarial attacks to assess their robustness. They analyzed the results to identify which approaches were more robust and generalized well.

Datasets

Four new real-world datasets (AI-Writer, ArticleForge, Kafkai, RedditBot) and existing datasets (RealNews, WebText).

Model(s)

GROVER, GLTR-BERT, GLTR-GPT2, BERT-Defense, RoBERTa-Defense, FAST. Logistic Regression is used as a classifier in GLTR.

Author countries

USA, Pakistan

← Previous