LookupForensics: A Large-Scale Multi-Task Dataset for Multi-Phase Image-Based Fact Verification

Authors: Shuhan Cui, Huy H. Nguyen, Trung-Nghia Le, Chun-Shien Lu, Isao Echizen

Published: 2024-07-26 09:15:29+00:00

AI Summary

This paper introduces a new task, image-based automated fact verification, and presents a two-phase open framework combining forgery identification and fact retrieval. A large-scale, multi-task dataset, LookupForensics, is also introduced, featuring various image manipulations and extensive annotations for diverse sub-tasks.

Abstract

Amid the proliferation of forged images, notably the tsunami of deepfake content, extensive research has been conducted on using artificial intelligence (AI) to identify forged content in the face of continuing advancements in counterfeiting technologies. We have investigated the use of AI to provide the original authentic image after deepfake detection, which we believe is a reliable and persuasive solution. We call this image-based automated fact verification, a name that originated from a text-based fact-checking system used by journalists. We have developed a two-phase open framework that integrates detection and retrieval components. Additionally, inspired by a dataset proposed by Meta Fundamental AI Research, we further constructed a large-scale dataset that is specifically designed for this task. This dataset simulates real-world conditions and includes both content-preserving and content-aware manipulations that present a range of difficulty levels and have potential for ongoing research. This multi-task dataset is fully annotated, enabling it to be utilized for sub-tasks within the forgery identification and fact retrieval domains. This paper makes two main contributions: (1) We introduce a new task, image-based automated fact verification, and present a novel two-phase open framework combining forgery identification and fact retrieval. (2) We present a large-scale dataset tailored for this new task that features various hand-crafted image edits and machine learning-driven manipulations, with extensive annotations suitable for various sub-tasks. Extensive experimental results validate its practicality for fact verification research and clarify its difficulty levels for various sub-tasks.


Key findings
The LookupForensics dataset proved challenging for existing SOTA forgery detection and retrieval models, highlighting its effectiveness in evaluating advancements in the field. The two-phase framework demonstrated superior fact verification performance compared to a baseline retrieval framework, showcasing the benefit of integrating detection and retrieval.
Approach
The authors propose a two-phase framework: forgery identification (detecting if an image is forged and classifying the type of forgery) and fact retrieval (retrieving the original image(s) using the entire image or identified forgery segments). This combines forgery detection with image retrieval to provide a more reliable and persuasive solution to deepfake detection.
Datasets
LookupForensics dataset (built on Google's Open Images Dataset), ISC2021 benchmark dataset (for comparison), CASIA, CoMoFoD, Carvalho, Columbia, GC, SH, LB datasets (for comparison)
Model(s)
VGG19, EfficientNet-B4, BusterNet, DOA-GAN, Serial Network, MFCN, Mantra-Net, DenseFCN, MT-Net, HP-FCN, IID-Net, GIST, MultiGrain.
Author countries
Japan, Vietnam, Taiwan