AI'll Be Back: Generative AI in Image, Audio and Video Production

This talk introduces you to the world of generative AI with a focus on Text-to-Image, Text-to-Audio and Text-to-Video for creating images, music and short videos. We explain how neural networks can generate various output formats from short text inputs using diffusion models and so-called Transformer architectures.

We focus on advanced technologies such as Sora or Midjourney. The techniques used, such as Latent Diffusion Models, allow us to generate and edit images and videos by combining text understanding through attention mechanisms and transformers with denoising processes.

A detailed examination of the video generation process with Sora shows how it compresses visual data, breaks it into patches, and then reconstructs it into the final video. In addition to Sora, we also discuss alternative methods and tools like RunwayML or SunoAI, to present a broad spectrum of tools for image, audio and video generation.

By the end of this talk, you will have a basic understanding of diffusion models, an overview of tools for image, audio and video generation, and a deeper understanding of the functionality. Practical examples and demos round off the presentation.

* many live demos
* 45min to 60min
* published July 2024

Martin Förtsch

TNG Technology Consulting GmbH, Principal Consultant

Munich, Germany

Actions

View Speaker Profile

Please note that Sessionize is not responsible for the accuracy or validity of the data provided by speakers. If you suspect this profile to be fake or spam, please let us know.

Session

AI'll Be Back: Generative AI in Image, Audio and Video Production

Martin Förtsch

Links

Actions