Invention Title:

ADAPTIVE SPEECH REGENERATION

Publication number:

US20240371357

Publication date:
Section:

Physics

Class:

G10L13/02

Inventors:

Applicant:

Smart overview of the Invention

Adaptive speech regeneration involves using an audio transformation model to process and resynthesize speech signals from a speaking entity. The method captures audio signals, divides them into segments, and inputs them into the model to generate voice and speech vector representations. The voice vector captures the speaker's vocal characteristics, while the speech vector contains the spoken words and their contextual attributes. These vectors are used to regenerate the speech, preserving the speaker's unique voice.

Technical Field

The technology focuses on audio processing, specifically using machine learning models to regenerate speech signals. It addresses challenges in telecommunication systems where noise and compression artifacts can degrade audio quality. By employing advanced audio transformation techniques, this method improves communication clarity in scenarios such as teleconferences.

Background

Traditional teleconferencing systems face issues with noise interference and data compression, which can hinder effective communication. Common solutions involve digital signal processing techniques like noise reduction and echo cancellation, but these often introduce additional artifacts. Newer approaches aim to enhance audio quality by developing better CODECs for compressing and decompressing audio with minimal data loss.

Innovative Approach

The proposed method transmits voice and speech vector representations instead of acoustic data, inherently eliminating noise sources. This results in high-quality regenerated speech that maintains the speaker's voice characteristics. The approach reduces bandwidth requirements significantly compared to traditional methods, offering a more efficient solution for low-bandwidth communications.

Technical Benefits

This method provides several advantages over conventional audio processing technologies. It reduces memory resource usage by eliminating the need for extensive denoising and filtering processes. Additionally, it achieves a higher signal-to-noise-and-distortion ratio (SNDR) and a broader frequency response range. Machine learning models enable real-time voice cloning and modification, allowing for versatile applications in telecommunication systems.