Invention Title:

Actor-Replacement System for Videos

Publication number:

US20240304219

Publication date:

2024-09-12

Section:

Physics

Class:

G11B27/036

Inventors:

Sunil Ramesh Cupertino, CA, United States

KARINA LEVITIAN AUSTIN, TX, United States

Michael Cutter Golden, CO, United States

Applicant:

Roku, Inc. San Jose, CA, United States

Drawings (4 of 9)

Drawing 01 for Actor-Replacement System for Videos

Drawing 02 for Actor-Replacement System for Videos

Drawing 03 for Actor-Replacement System for Videos

Drawing 04 for Actor-Replacement System for Videos

Smart overview of the Invention

The actor-replacement system addresses the need to substitute an original actor in a video with a replacement actor for various reasons, such as availability or audience preference. Additionally, it allows for the modification of dialogue from one language to another without the labor-intensive process of re-recording scenes. This system leverages advanced computing techniques to streamline the replacement process while maintaining video quality and synchronization.

Methodology

An example method involves estimating the pose of the original actor in each frame using a skeletal detection model. The system then acquires images of the replacement actor corresponding to these estimated poses. Furthermore, it obtains speech from the replacement actor that aligns with the original actor's dialogue. The key is to generate synthetic frames that accurately depict the replacement actor's expressions and movements synchronized with the new speech.

Generating Synthetic Frames

The generation of synthetic frames occurs in two main steps. First, images of the replacement actor are inserted into the video frames based on the original actor's poses. Second, a video-synthesis model is utilized to create facial expressions that match the timing of the replacement speech. This ensures that the replacement actor appears natural and cohesive within the context of the original video.

Combining Frames and Speech

Once synthetic frames are created, they are combined with the replacement speech to produce a complete synthetic video. This process may involve replacing the original audio track with the newly generated dialogue, ensuring that lip synchronization is maintained throughout. The result is a seamless integration of both visual and auditory elements, enhancing viewer experience.

Additional Applications

The system also accommodates language translation by generating synthesized speech in a target language while adjusting facial expressions accordingly. By utilizing a speech engine capable of voice modification, it transforms dialogue without compromising on visual fidelity. Overall, this technology provides an efficient solution for modifying video content post-production, catering to diverse audiences and preferences.