Invention Title:

GENERATION OF VIDEO STREAM HAVING LOCALIZED LIP-SYNCING WITH PERSONALIZED CHARACTERISTICS

Publication number:

US20240430497

Publication date:

2024-12-26

Section:

Electricity

Class:

H04N21/2343

Inventors:

SU LIU Austin, TX, United States

Jun Su Beijing, China

Yang Liang Beijing, China

Luis Osvaldo Pizana Austin, TX, United States

Applicant:

INTERNATIONAL BUSINESS MACHINES CORPORATION Armonk, NY, United States

Smart overview of the Invention

The patent application introduces a method for enhancing video streams by localizing lip-syncing to match translated audio with visual speech movements. This approach involves detecting the cultural context and accents of speakers in a video, as well as the audience, to select appropriate accent tags. These tags are applied to a textual representation of spoken words, which is then translated from a source language to a target language. The method ensures that the modified speech lip movements align with both the target language and the selected accent.

Background

Traditional lip-syncing often results in mismatched audio and visual elements, particularly when translating videos into different languages. This discrepancy makes it challenging for viewers to understand the content, even if the audio is in their native language. Lip reading, which involves interpreting spoken language through visual cues like lip movements and facial expressions, is crucial for effective communication, especially for those who are deaf or hard of hearing.

Technical Implementation

The method involves translating spoken words from a source language to a target language while applying accent tags based on cultural and linguistic contexts. The system modifies the lip movements of speakers in the video to match these translated words and accents. The result is a video stream where speakers appear to naturally speak in the target language with synchronized lip movements.

System and Methodology

The system comprises a processor and integrated logic to execute the described methods. It can be implemented as a computer program product stored on readable media or as part of a broader system including various computing environments like public or private clouds. The method can also be customized for individual users by selecting specific accents for rendering lip movements onto speakers in the video stream.

Applications and Advantages

This technology offers significant improvements in video translation quality by making lip-syncing more lifelike and understandable across different languages and cultural contexts. It enhances communication for diverse audiences, including those with hearing impairments, by providing a more accurate visual representation of speech. This innovation is applicable across various media platforms, improving accessibility and viewer engagement.