Invention Title:

ARTIFICIAL INTELLIGENCE BASED AUTO DUBBED LIP SYNCHRONIZATION GENERATION

Publication number:

US20260120377

Publication date:
Section:

Physics

Class:

G06T13/205

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The patent application introduces a system for real-time generation of translated dubbed lip-synchronizations using artificial intelligence. This system aims to enhance the viewing experience by addressing the common issue of poor synchronization between dubbed audio and video. It leverages various AI models to process audio and video content, ensuring that translated speech audio is perfectly aligned with the speaker's lip movements in the video.

Key Components

The system comprises several key components, including a speaker diarization model, a face detection model, an active speaker detection model, a translation model, and a lip-synchronization model. These components work together to process streaming video content, separating audio into background and individual speaker feeds, detecting faces, pairing audio with video frames, and generating synchronized video frames with translated speech.

Technological Process

The method involves inputting audio to a speaker diarization model, separating it into background and speaker-specific feeds. Video content is processed through a face detection model to crop frames with faces. The active speaker detection model pairs these frames with the corresponding audio feed. Translated speech audio is generated and synchronized with the video frames using a predictive model, ensuring accurate lip movements.

Implementation Details

The predictive model plays a crucial role by predicting 3D mesh vertices for facial alignment and extracting texture information from cropped frames. It also employs reverse diffusion techniques to handle image noise, enhancing the realism of lip-synchronized video frames. The system is designed to match or exceed the frame rate of the streaming content, ensuring smooth playback.

Real-Time Application

This AI-driven solution is designed for real-time application, making it suitable for live broadcasts and streaming services. It minimizes network performance dependency and eliminates media type limitations, broadening accessibility for non-English speaking audiences. The system's ability to generate high-quality dubbed content in real-time addresses the growing demand for multilingual media options.