Invention Title:

ARTIFICIAL INTELLIGENCE DEVICE FOR A DIGITIAL AVATAR WITH 3D INTERATION CAPABILITIES AND CONTROL METHOD THEREOF

Publication number:

US20250022200

Publication date:
Section:

Physics

Class:

G06T13/205

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The patent application describes a method for controlling an AI device to implement a digital avatar with 3D interaction capabilities. The process begins with receiving an audio signal from a user, which is then converted into a text query using a speech-to-text neural network model. This text query is further processed to generate high-level movement instructions and a text response, which are used to control the digital avatar's movements and interactions. The text response is also converted back into audio for playback.

Technical Process

The system utilizes several neural network models to enhance the avatar's capabilities. After converting the user's audio input to text, a large language gesture instruction model creates movement instructions. These instructions, along with the text query, are fed into an information retrieval model to produce a comprehensive text response and control data for the avatar. The response is then processed by a text-to-speech model to generate an audio reply, which informs further avatar animations.

Animation and Interaction

The generated audio response is input into models that animate facial expressions and conversational gestures, creating a dynamic and lifelike digital avatar. This includes updating the digital avatar control information with gesture details, enabling it to interact in real-time with users through natural gestures and expressions synchronized with audio playback.

Enhanced Capabilities

The invention addresses limitations of existing digital avatars by improving 3D interaction and facial animation. It allows avatars to engage in domain-specific conversations using industry knowledge, providing contextually relevant interactions. The system can generate realistic facial animations directly from audio inputs, enhancing user experience by making virtual interactions more authentic.

Applications

This technology is applicable in various fields such as virtual meetings, customer service, and entertainment. It can retrieve relevant documents based on user queries, enhancing the avatar's ability to provide accurate information and insights. Additionally, the system supports creating 3D face meshes from video inputs for realistic facial reconstruction, further improving interaction quality.