Creating a photorealistic avatar speaking any sentence starting from a written input text.
Focusing on autoencoders, we will do a journey from the beginning (Of the speaker experience), mistakes and tips learned along the path.
Will be showcased:
- Intro, the timeline from beginning to nowadays
- Is NOT a deepfake
- Audio processing techniques: STFT (Short Term Fourier Transform), MELs and custom solutions
- Deeplearning models and architecture
- The technique, inspired to inpaiting, used to animate the mouth
- Masks and convolution
- Landmarks extraction
Microsoft Azure will support us.