Invention Title:

AUTOREGRESSIVE CONTENT RENDERING FOR TEMPORALLY COHERENT VIDEO GENERATION

Publication number:

US20240354996

Publication date:

2024-10-24

Section:

Physics

Class:

G06T9/00

Inventors:

Sajid Sadi San Jose, CA, United States

Ankur Gupta San Jose, CA, United States

Hyun Jae Kang Mountain View, CA, United States

Siddarth Ravichandran Santa Clara, CA, United States

Varun Menon Mountain View, CA, United States

Applicant:

Samsung Electronics Co., Ltd. Gyeonggi-do, Korea (South)

Drawings (4 of 13)

Drawing 01 for AUTOREGRESSIVE CONTENT RENDERING FOR TEMPORALLY COHERENT VIDEO GENERATION

Drawing 02 for AUTOREGRESSIVE CONTENT RENDERING FOR TEMPORALLY COHERENT VIDEO GENERATION

Drawing 03 for AUTOREGRESSIVE CONTENT RENDERING FOR TEMPORALLY COHERENT VIDEO GENERATION

Drawing 04 for AUTOREGRESSIVE CONTENT RENDERING FOR TEMPORALLY COHERENT VIDEO GENERATION

Smart overview of the Invention

Autoregressive content rendering focuses on generating temporally coherent videos, particularly digital humans. The increasing popularity of digital humans in contexts like gaming and the metaverse highlights the need for realistic and seamless animations. Current challenges involve ensuring that generated videos are free from visual artifacts such as jitter and glitches, which can disrupt the user experience.

Challenges in Digital Human Creation

Creating digital humans requires overcoming the limitations of existing technologies that rely on sparse and noisy input features like keypoints and contours. These inputs often lead to jittery or glitchy movements in non-common features such as hair or clothing. In multi-modal settings, additional noise from audio data can further affect the coherence of mouth and lip movements.

Solution Overview

The proposed method involves using an autoencoder network to generate a series of predicted images, which are then fed back into the network. By encoding both predicted images and keypoint images, the system aims to produce temporally coherent video content. This iterative decoding process helps maintain smooth transitions between video frames, addressing issues of temporal incoherence.

System and Methodologies

The autoencoder network comprises a first encoder for predicted images and a second encoder for keypoint images, along with a decoder to iteratively generate predicted images. This setup is designed to enhance temporal coherence by leveraging encoded information from previous iterations. The technology can be implemented through various systems, devices, or computer program products.

Applications and Benefits

This approach enables the creation of high-resolution video content suitable for display on larger screens without visible artifacts. By ensuring smooth transitions between frames, the method enhances user engagement with digital humans in virtual environments. The technology supports applications ranging from realistic avatars in gaming to interactive virtual experiences.