Generated Graph Detection

View on arXiv ← Back to list

Authors: Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, Yang Zhang

Published: 2023-06-13 13:18:04+00:00

AI Summary

This paper introduces the novel problem of generated graph detection, aiming to distinguish synthetic graphs from real ones. A framework is proposed that evaluates three existing machine learning models across four scenarios with varying levels of dataset and generator overlap, demonstrating the feasibility of generated graph detection.

Abstract

Graph generative models become increasingly effective for data distribution approximation and data augmentation. While they have aroused public concerns about their malicious misuses or misinformation broadcasts, just as what Deepfake visual and auditory media has been delivering to society. Hence it is essential to regulate the prevalence of generated graphs. To tackle this problem, we pioneer the formulation of the generated graph detection problem to distinguish generated graphs from real ones. We propose the first framework to systematically investigate a set of sophisticated models and their performance in four classification scenarios. Each scenario switches between seen and unseen datasets/generators during testing to get closer to real-world settings and progressively challenge the classifiers. Extensive experiments evidence that all the models are qualified for generated graph detection, with specific models having advantages in specific scenarios. Resulting from the validated generality and oblivion of the classifiers to unseen datasets/generators, we draw a safe conclusion that our solution can sustain for a decent while to curb generated graph misuses.

Key findings

All three models demonstrated effectiveness in generated graph detection across all four scenarios. The metric learning model performed best in the 'closed world' scenario, while the contrastive learning model showed superior generalization to unseen datasets and generators in 'open world' scenarios. The results suggest that generated graphs, even from unseen generators and datasets, can be reliably detected.

Approach

The authors propose a framework using three existing machine learning models (end-to-end classifier, contrastive learning-based model, and metric learning-based model) to classify graphs as real or generated. The framework considers four scenarios with varying levels of overlap between training and testing datasets and generators.

Datasets

AIDS, Alchemy, Deezer, DBLP, GitHub, COLLAB, Twitch

Model(s)

GCN, GraphCL (Contrastive Learning), Siamese Network (Metric Learning), MLP, XGBoost

Author countries

Germany, USA

← Previous