Invention Title:

AUGMENTED STREAMING MEDIA

Publication number:

US20250392766

Publication date:
Section:

Electricity

Class:

H04N21/233

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The innovation addresses limitations of existing technologies by introducing a method to enhance multimedia streaming. It involves analyzing the audio component of a multimedia stream to identify periods without foreground voice activity. During these identified windows, a text description of the video content is generated, converted to a synthesized voice, and seamlessly integrated back into the audio stream. This ensures that additional audio commentary is provided without overlapping existing dialogue.

Method and Process

Key steps include examining the audio stream to detect silent windows, processing the multimedia data to generate descriptive text, and transforming this text into a voice segment. The synthesized voice is then inserted into the audio stream within the identified silent windows. This process allows for enhanced audio commentary while maintaining the original audio integrity.

System and Implementation

The system comprises a memory and processing unit that executes program instructions to perform the method described. The instructions enable the detection of silent windows, generation of text, and synthesis of voice segments. This system is designed to work dynamically with multimedia streams, adapting in real-time to varying audio conditions.

Technical Features

Additional features include the ability to predict the duration and style of silent windows, ensuring accurate placement of synthesized audio. The system can also evaluate text strings for semantic redundancy to avoid repetitive information. Machine learning models are utilized to enhance predictions and system accuracy over time, adapting to new data and user interactions.

User Engagement

This approach improves user engagement by providing audio descriptions of visual content, enhancing accessibility for users who may rely on audio cues. The system also adapts to specific contexts, such as sports events, by configuring synthesized voice characteristics based on predicted sentiment, thus enriching the overall user experience.