MMSys'21 Grand Challenge on Detecting Cheapfakes

Authors: Shivangi Aneja, Cise Midoglu, Duc-Tien Dang-Nguyen, Michael Alexander Riegler, Paal Halvorsen, Matthias Niessner, Balu Adsumilli, Chris Bregler

Published: 2021-07-12 10:14:45+00:00

AI Summary

This research paper describes the MMSys'21 Grand Challenge on Detecting Cheapfakes, focusing on the out-of-context (OOC) misuse of images in news items. The challenge aims to develop and benchmark models capable of identifying whether image-caption pairings are OOC based on the COSMOS dataset.

Abstract

Cheapfake is a recently coined term that encompasses non-AI (cheap) manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse} of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset.


Key findings
The challenge successfully benchmarks various models for OOC image-caption detection. A baseline model achieved 82% accuracy on the public test split. The challenge highlights the importance of addressing cheapfakes and encourages further research in this area.
Approach
The challenge focuses on detecting out-of-context (OOC) images by comparing multiple captions associated with a single image. Participants are tasked with developing models that can classify image-caption triplets as either OOC or not OOC. A self-supervised baseline model is provided, learning co-occurrence patterns of images and captions without explicit OOC annotations.
Datasets
COSMOS dataset (approx. 200K images with 450K captions from news websites, blogs, and social media)
Model(s)
A self-supervised baseline model is provided but participants are encouraged to develop their own models. The baseline model uses a self-supervised training strategy comparing captioned images and random captions to learn co-occurrence patterns.
Author countries
Germany, Norway, USA