US20240304219
2024-09-12
Physics
G11B27/036
The actor-replacement system addresses the need to substitute an original actor in a video with a replacement actor for various reasons, such as availability or audience preference. Additionally, it allows for the modification of dialogue from one language to another without the labor-intensive process of re-recording scenes. This system leverages advanced computing techniques to streamline the replacement process while maintaining video quality and synchronization.
An example method involves estimating the pose of the original actor in each frame using a skeletal detection model. The system then acquires images of the replacement actor corresponding to these estimated poses. Furthermore, it obtains speech from the replacement actor that aligns with the original actor's dialogue. The key is to generate synthetic frames that accurately depict the replacement actor's expressions and movements synchronized with the new speech.
The generation of synthetic frames occurs in two main steps. First, images of the replacement actor are inserted into the video frames based on the original actor's poses. Second, a video-synthesis model is utilized to create facial expressions that match the timing of the replacement speech. This ensures that the replacement actor appears natural and cohesive within the context of the original video.
Once synthetic frames are created, they are combined with the replacement speech to produce a complete synthetic video. This process may involve replacing the original audio track with the newly generated dialogue, ensuring that lip synchronization is maintained throughout. The result is a seamless integration of both visual and auditory elements, enhancing viewer experience.
The system also accommodates language translation by generating synthesized speech in a target language while adjusting facial expressions accordingly. By utilizing a speech engine capable of voice modification, it transforms dialogue without compromising on visual fidelity. Overall, this technology provides an efficient solution for modifying video content post-production, catering to diverse audiences and preferences.