Invention Title:

SIGN LANGUAGE GENERATION AND DISPLAY

Publication number:

US20240378782

Publication date:

2024-11-14

Section:

Physics

Class:

G06T13/00

Inventors:

David C. White, Jr. St. Petersburg, FL, United States

Pedro Jesus Garcia Chavez Mexico City, Mexico

Elena Gribanova Dublin, CA, United States

Valentin Filippov Maple, Canada

Wei Yan Oakland, CA, United States

Applicant:

Cisco Technology, Inc. San Jose, CA, United States

Smart overview of the Invention

The patent application describes a method to enhance video sessions by integrating sign language through animated avatars. It involves processing video and audio data to determine spoken words and the speaker's location within the video frame. An avatar is then generated to perform sign language corresponding to the spoken words. The video is modified to include this avatar, ensuring it appears near the speaker, and is then output for viewing.

Technical Field

This innovation addresses challenges in video presentations for individuals who cannot perceive audio output effectively. By incorporating sign language avatars, it aims to provide comprehensive understanding of video content without relying solely on audio, thus offering an inclusive solution for hearing-impaired users.

Sign Language Integration

The system uses avatars to translate spoken words into sign language, making it more accessible than captions or subtitles. The avatars are dynamically inserted into videos based on the speaker's voice, allowing users to associate them easily with the speaker. This setup enables simultaneous viewing of both the speaker and the avatar, enhancing comprehension.

Avatar Customization

Avatars can be visually customized to resemble the speaker by matching characteristics such as gender, clothing, and accessories. This personalization helps users better associate avatars with speakers. Avatars appear only when a participant is speaking, preventing unnecessary distractions and maintaining focus on active participants.

System Architecture

The system comprises server devices and multiple endpoint devices connected via networks. Endpoint devices can be personal or shared videoconferencing tools. The server processes audio/video data to generate sign language content through transcription, video processing, and animation engines. The modified video with avatars is then distributed back to endpoint devices for display, allowing users to receive information visually without relying on audio.