Invention Title:

Real-Time Avatar Animation

Publication number:

US20250104318

Publication date:

2025-03-27

Section:

Physics

Class:

G06T13/205

Inventors:

Siva Penke San Jose, CA, United States

Liang ZHAO Saratoga, CA, United States

Christopher Peri Mountain View, CA, United States

Jisun Park Palo Alto, CA, United States

Byeonghee Yu Berkeley, CA, United States

Applicant:

SAMSUNG ELECTRONICS CO., LTD. Suwon-si, Korea (South)

Smart overview of the Invention

The patent application introduces a method for real-time avatar animation that employs a trained audio source separation model. This model processes an audio input containing both vocal and non-vocal sounds, separating them into distinct outputs. These outputs are then used by trained avatar animation models to create animations that correspond temporally with the audio input. The animation is rendered in real time, allowing avatars to perform actions such as singing and dancing along with the audio.

Technical Field

This application pertains to the field of real-time avatar animation. Avatars, which can be either two-dimensional or three-dimensional graphical representations of individuals, are often dynamic and can be animated to exhibit various actions and expressions. The technology described focuses on synchronizing these animations with audio inputs, enhancing interactive experiences like karaoke.

Example Method

An example method involves accessing an audio input, such as a song, which includes a combination of vocal and non-vocal sounds. A trained audio source separation model divides this input into separate vocal and non-vocal outputs. These outputs are then encoded separately to animate an avatar in sync with the audio. The avatar can be displayed on various electronic devices, providing a synchronized visual experience.

Training Process

The source separation model is trained using deep-neural-network (DNN) architectures that incorporate self-supervised learning components during training to improve performance. These components help refine the model's ability to accurately separate vocal and non-vocal sounds while retaining meaningful features for reconstructing the audio stream. Additionally, the model may be trained alongside other components of the DNN architecture for better task-specific performance.

User Interaction

The system allows for user interaction through natural language inputs, which can influence the avatar's animation style or emotional expression. A user instruction classification model interprets these inputs to provide emotion or dance style encodings, which are then used to adjust the avatar's facial expressions or dance moves accordingly. This feature enhances user engagement by allowing personalized avatar animations.