DUPE: Detection Undermining via Prompt Engineering for Deepfake Text

Authors: James Weichert, Chinecherem Dimobi

Published: 2024-04-17 14:10:27+00:00

AI Summary

This paper evaluates three AI text detectors against human and AI-generated essays, finding high false positive and false negative rates. It demonstrates that prompt engineering with ChatGPT 3.5 can significantly increase the false negative rate of all detectors by paraphrasing AI-generated text, effectively bypassing detection.

Abstract

As large language models (LLMs) become increasingly commonplace, concern about distinguishing between human and AI text increases as well. The growing power of these models is of particular concern to teachers, who may worry that students will use LLMs to write school assignments. Facing a technology with which they are unfamiliar, teachers may turn to publicly-available AI text detectors. Yet the accuracy of many of these detectors has not been thoroughly verified, posing potential harm to students who are falsely accused of academic dishonesty. In this paper, we evaluate three different AI text detectors-Kirchenbauer et al. watermarks, ZeroGPT, and GPTZero-against human and AI-generated essays. We find that watermarking results in a high false positive rate, and that ZeroGPT has both high false positive and false negative rates. Further, we are able to significantly increase the false negative rate of all detectors by using ChatGPT 3.5 to paraphrase the original AI-generated texts, thereby effectively bypassing the detectors.


Key findings
All three detectors showed significant vulnerabilities; ZeroGPT performed poorly with high false positive and negative rates. ChatGPT paraphrasing successfully bypassed detection in a substantial portion of cases (at least 50% success rate for all detectors), highlighting the limitations of current AI text detection methods.
Approach
The researchers evaluated three AI text detectors (Kirchenbauer et al. watermarks, ZeroGPT, and GPTZero) using human-written and AI-generated essays. They then used ChatGPT 3.5 to paraphrase the AI-generated texts to test the detectors' resilience to evasion attacks.
Datasets
Michigan Corpus of Upper-Level Student Papers (MICUSP) and ChatGPT-generated essays based on MICUSP prompts.
Model(s)
GPT Neo (for watermarking), ChatGPT 3.5 (for paraphrasing), ZeroGPT, and GPTZero.
Author countries
USA