Invention Title:

Ambient Noise Capture for Speech Synthesis of In-Game Character Voices

Publication number:

US20240304174

Publication date:

2024-09-12

Section:

Physics

Class:

G10L13/027

Inventors:

Celeste M.B. Bean San Francisco, CA, United States

Landon G. Noss Glendale, AZ, United States

Assignee:

Sony Interactive Entertainment Inc. Tokyo, Japan

Applicant:

SONY INTERACTIVE ENTERTAINMENT INC. Tokyo, Japan

Drawings (4 of 8)

Drawing 01 for Ambient Noise Capture for Speech Synthesis of In-Game Character Voices

Drawing 02 for Ambient Noise Capture for Speech Synthesis of In-Game Character Voices

Drawing 03 for Ambient Noise Capture for Speech Synthesis of In-Game Character Voices

Drawing 04 for Ambient Noise Capture for Speech Synthesis of In-Game Character Voices

Smart overview of the Invention

Methods and systems have been developed to enhance video game experiences by capturing and utilizing ambient noise from a player's environment. This involves identifying background voices distinct from the player's voice, isolating phonemes within those voices, and using generative artificial intelligence (AI) to create synthesized speech for in-game characters. The synthesized speech can be tailored based on various attributes like age and gender, allowing for a more immersive experience.

Technical Implementation

The system operates by receiving audio input through a microphone associated with a gaming device. It distinguishes between the user's voice and background voices by analyzing sound characteristics such as frequency patterns and cadence. Once a background voice is identified, the system isolates its phonemes, ensuring enough data exists to synthesize coherent speech. This synthesized speech can be integrated into gameplay, enhancing character interactions with dynamic dialogue options.

Customization Features

Players have the ability to customize their gaming experience further by selecting different background voices for characters. The system allows for the generation of speech based on both static scripts and real-time gameplay changes. Additionally, players can modify the synthesized speech through filtering options that adjust attributes like pitch or gender, making the character's voice fit the desired persona more closely.

Voice Fingerprinting and Analysis

The method includes deriving a unique "voice fingerprint" for each identified background voice. This fingerprint captures essential features such as timbre and sharpness of speech, which are vital for accurate reproduction. The system continuously samples audio during gameplay or at specific intervals to ensure a comprehensive understanding of the environment's soundscape and to refine the voice fingerprint as needed.

Future Applications

The technology aims to create a more engaging gaming environment by integrating real-world sounds into virtual interactions. By saving voice fingerprints and their parameters, the system can recall distinct background voices for future use, allowing for seamless integration in ongoing gameplay. This approach not only enriches player immersion but also opens avenues for further innovations in audio-visual experiences in gaming.