Invention Title:

Context-Aware Speech Recognition Using Prompts for Language Learners

Publication number:

US20250279090

Publication date:

2025-09-04

Section:

Physics

Class:

G10L15/063

Inventor:

Jian Cheng 🇺🇸 Mountain View, CA, United States

Assignee:

Google LLC 🇺🇸 Mountain View, CA, United States

Applicant:

Google LLC 🇺🇸 Mountain View, CA, United States

Smart overview of the Invention

The patent application addresses the challenge of improving automatic speech recognition (ASR) systems for language learners. It introduces a method that uses context-aware prompts to enhance speech recognition accuracy, especially for users who are non-native speakers. The method involves generating an initial prompt with a requested response and an ideal answer to guide the user in speaking a particular utterance in a target language.

Methodology

The process begins by creating an initial prompt designed to elicit a user response in a first language. This prompt is then sent to a user device, where the user, a native speaker of a different language, provides an audio response. The system conditions a speech model on this initial prompt, allowing it to generate a more accurate speech recognition result from the user's audio input.

Technical Features

Key technical features include the use of an ASR model that generates higher-order feature representations from audio data. This involves using an audio encoder and prediction network to process non-blank output symbols and the initial prompt. The system may also incorporate multimodal large language models (LLMs) to enhance recognition accuracy, particularly when dealing with accented or non-native speech patterns.

Applications

The invention is particularly useful for language learning applications where precise feedback on grammar, usage, and pronunciation is essential. It accommodates various user responses, such as repeating phrases, reading aloud, or describing images and silent videos. The method ensures that feedback is tailored to the user's specific language learning context, improving overall learning outcomes.

Benefits and Implications

By addressing the variability and unpredictability of non-native speech, this approach enhances the effectiveness of ASR systems in educational contexts. It provides more accurate transcriptions and feedback, reducing the risk of learners practicing incorrect pronunciations or grammar. This advancement supports diverse learning backgrounds and goals, making ASR systems more adaptable and robust in handling different speech patterns.