Pros and Cons of GAN Evaluation Measures: New Developments

View on arXiv ← Back to list

Authors: Ali Borji

Published: 2021-03-17 01:48:34+00:00

AI Summary

This paper updates a previous work on GAN evaluation measures. It reviews newly emerged quantitative and qualitative techniques for evaluating GANs, including advancements in metrics like FID and IS, and discusses the connection between GAN evaluation and deepfakes.

Abstract

This work is an update of a previous paper on the same topic published a few years ago. With the dramatic progress in generative modeling, a suite of new quantitative and qualitative techniques to evaluate models has emerged. Although some measures such as Inception Score, Frechet Inception Distance, Precision-Recall, and Perceptual Path Length are relatively more popular, GAN evaluation is not a settled issue and there is still room for improvement. Here, I describe new dimensions that are becoming important in assessing models (e.g. bias and fairness) and discuss the connection between GAN evaluation and deepfakes. These are important areas of concern in the machine learning community today and progress in GAN evaluation can help mitigate them.

Key findings

The paper highlights the ongoing challenges in GAN evaluation, particularly biases in existing metrics like FID and IS. It summarizes various new quantitative and qualitative measures designed to address these challenges and improve the assessment of GAN performance, including those focusing on spatial information, class awareness, and manifold analysis.

Approach

The paper surveys and categorizes recent advancements in quantitative and qualitative GAN evaluation measures, focusing on improvements to existing metrics (like FID and IS) and the introduction of novel methods to address biases and limitations. It does not propose a new method for deepfake detection.

Datasets

ImageNet, Kinetics-400, Kinetics-600 (mentioned in context of pre-trained models for FVD)

Model(s)

InceptionNet, Inception-v3, I3D (Inflated 3D Convnet) - used within evaluation metrics, not for deepfake detection itself.

Author countries

UNKNOWN

← Previous