Which LLMs are Difficult to Detect? A Detailed Analysis of Potential Factors Contributing to Difficulties in LLM Text Detection

Authors: Shantanu Thorat, Tianbao Yang

Published: 2024-10-18 21:42:37+00:00

AI Summary

This research investigates the varying difficulty of detecting AI-generated text from different Large Language Models (LLMs). Using two datasets and a deep learning approach, the study reveals that detection performance differs significantly across writing domains and LLM families, with OpenAI LLMs being particularly challenging to identify.

Abstract

As LLMs increase in accessibility, LLM-generated texts have proliferated across several fields, such as scientific, academic, and creative writing. However, LLMs are not created equally; they may have different architectures and training datasets. Thus, some LLMs may be more challenging to detect than others. Using two datasets spanning four total writing domains, we train AI-generated (AIG) text classifiers using the LibAUC library - a deep learning library for training classifiers with imbalanced datasets. Our results in the Deepfake Text dataset show that AIG-text detection varies across domains, with scientific writing being relatively challenging. In the Rewritten Ivy Panda (RIP) dataset focusing on student essays, we find that the OpenAI family of LLMs was substantially difficult for our classifiers to distinguish from human texts. Additionally, we explore possible factors that could explain the difficulties in detecting OpenAI-generated texts.


Key findings
Detection difficulty varies across domains, with scientific writing being more challenging than opinion statements or story generation. OpenAI LLMs were consistently difficult to detect in both datasets, potentially due to higher text complexity and lower out-of-vocabulary ratios compared to other LLMs.
Approach
The authors trained AI-generated text classifiers using the LibAUC library, optimizing for the Area Under the Curve (AUC) metric to address data imbalance. They employed DistilRoBERTa, a transformer-based model, and evaluated performance across various LLMs and writing domains.
Datasets
Deepfake Text dataset (covering opinion statements, scientific writing, and story generation) and a new Rewritten Ivy Panda (RIP) dataset of student essays.
Model(s)
DistilRoBERTaForSequenceClassification (a transformer-based model).
Author countries
United Kingdom, United States