Deep Learning and Synthetic Media

Authors: Raphaël Millière

Published: 2022-05-11 20:28:09+00:00

AI Summary

This paper analyzes the impact of deep learning on synthetic audiovisual media (DLSAM), arguing that deep learning techniques don't just improve existing methods but fundamentally challenge traditional distinctions in media synthesis by enabling genuinely novel forms.

Abstract

Deep learning algorithms are rapidly changing the way in which audiovisual media can be produced. Synthetic audiovisual media generated with deep learning - often subsumed colloquially under the label deepfakes - have a number of impressive characteristics; they are increasingly trivial to produce, and can be indistinguishable from real sounds and images recorded with a sensor. Much attention has been dedicated to ethical concerns raised by this technological development. Here, I focus instead on a set of issues related to the notion of synthetic audiovisual media, its place within a broader taxonomy of audiovisual media, and how deep learning techniques differ from more traditional approaches to media synthesis. After reviewing important etiological features of deep learning pipelines for media manipulation and generation, I argue that deepfakes and related synthetic media produced with such pipelines do not merely offer incremental improvements over previous methods, but challenge traditional taxonomical distinctions, and pave the way for genuinely novel kinds of audiovisual media.


Key findings
Deep learning significantly reduces the resource requirements (skill, time, hardware) for creating synthetic media. Deep learning methods blur the lines between partially and totally synthetic media, and between archival and synthetic media. This leads to novel forms of media, such as controllable videos.
Approach
The paper provides a taxonomy of audiovisual media (hand-made, machine-made archival, and machine-made synthetic), then analyzes how deep learning-based methods fit within this taxonomy and challenge traditional distinctions between categories. It focuses on the etiological aspects of DLSAM creation and their resource requirements.
Datasets
UNKNOWN
Model(s)
Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), StyleGAN, StyleCLIP, multimodal Transformer models (like DALL-E and CLIP), and other deep generative models.
Author countries
USA