SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

Authors: Shail Desai, Aditya Pawar, Li Lin, Xin Wang, Shu Hu

Published: 2025-11-16 00:50:24+00:00

AI Summary

SynthGuard is an open, user-friendly platform designed for detecting and analyzing AI-generated multimedia. It integrates both traditional deepfake detectors and multimodal large language models (MLLMs) to provide explainable inference for identifying synthetic content. The platform unifies image and audio support within an interactive interface, aiming to make forensic analysis accessible to researchers, educators, and the public.

Abstract

Artificial Intelligence (AI) has made it possible for anyone to create images, audio, and video with unprecedented ease, enriching education, communication, and creative expression. At the same time, the rapid rise of AI-generated media has introduced serious risks, including misinformation, identity misuse, and the erosion of public trust as synthetic content becomes increasingly indistinguishable from real media. Although deepfake detection has advanced, many existing tools remain closed-source, limited in modality, or lacking transparency and educational value, making it difficult for users to understand how detection decisions are made. To address these gaps, we introduce SynthGuard, an open, user-friendly platform for detecting and analyzing AI-generated multimedia using both traditional detectors and multimodal large language models (MLLMs). SynthGuard provides explainable inference, unified image and audio support, and an interactive interface designed to make forensic analysis accessible to researchers, educators, and the public. The SynthGuard platform is available at: https://in-engr-nova.it.purdue.edu/


Key findings
This paper introduces the SynthGuard platform and its architecture, focusing on its design, features, and extensibility. It does not present experimental results or performance metrics regarding the effectiveness of its integrated deepfake detection models or MLLM-based analysis in identifying AI-generated content.
Approach
SynthGuard solves the problem by providing a modular, open platform that couples a React frontend with a Python FastAPI backend. It utilizes a suite of MLLM-agnostic detectors (including CNN/Transformer-based for images and a lightweight CNN for audio) for direct deepfake classification, alongside MLLM-aware detectors (like Qwen-VL-Chat, Whisper, Qwen2-VL-2B) for explainable, reasoning-based semantic verification of both image and audio content.
Datasets
AI-Face benchmark, ForensicsBench evaluation protocol
Model(s)
A lightweight CNN-based architecture (for MLLM-agnostic audio detection); Whisper, Qwen2-VL-2B (for MLLM-aware audio analysis)
Author countries
USA