US20240378782
2024-11-14
Physics
G06T13/00
The patent application describes a method to enhance video sessions by integrating sign language through animated avatars. It involves processing video and audio data to determine spoken words and the speaker's location within the video frame. An avatar is then generated to perform sign language corresponding to the spoken words. The video is modified to include this avatar, ensuring it appears near the speaker, and is then output for viewing.
This innovation addresses challenges in video presentations for individuals who cannot perceive audio output effectively. By incorporating sign language avatars, it aims to provide comprehensive understanding of video content without relying solely on audio, thus offering an inclusive solution for hearing-impaired users.
The system uses avatars to translate spoken words into sign language, making it more accessible than captions or subtitles. The avatars are dynamically inserted into videos based on the speaker's voice, allowing users to associate them easily with the speaker. This setup enables simultaneous viewing of both the speaker and the avatar, enhancing comprehension.
Avatars can be visually customized to resemble the speaker by matching characteristics such as gender, clothing, and accessories. This personalization helps users better associate avatars with speakers. Avatars appear only when a participant is speaking, preventing unnecessary distractions and maintaining focus on active participants.
The system comprises server devices and multiple endpoint devices connected via networks. Endpoint devices can be personal or shared videoconferencing tools. The server processes audio/video data to generate sign language content through transcription, video processing, and animation engines. The modified video with avatars is then distributed back to endpoint devices for display, allowing users to receive information visually without relying on audio.