Invention Title:

ADAPTIVE MULTIMODAL FUSING FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

Publication number:

US20240424407

Publication date:

2024-12-26

Section:

Human necessities

Class:

A63F13/67

Inventors:

Michael Mantor Orlando, FL, United States

Karthik Mohan Kumar San Jose, CA, United States

Pedro Antonio Pena Orlando, FL, United States

Archana Ramalingam Bellevue, WA, United States

Applicant:

ADVANCED MICRO DEVICES, INC. Santa Clara, CA, United States

Smart overview of the Invention

The patent application outlines advanced systems and techniques for generating and animating non-player characters (NPCs) in virtual digital environments. It leverages multimodal input data, which includes various input types for interaction with NPCs characterized by body and facial features. This data is processed using neural networks to create animation sequences that are realistic and responsive. The approach involves disentangling the input data to create latent representations, integrating these with the original data, and utilizing a large-language model (LLM) to produce speech for the NPCs. Further, reverse diffusion techniques are applied to generate detailed facial and joint movement data, which are used to animate the NPCs within specific environments.

Background

In the realm of artificial intelligence and gaming technology, NPCs play a pivotal role in enhancing user experience by contributing to narrative development and gameplay. Traditionally, NPCs operate on pre-scripted animations and dialogues, limiting their ability to interact dynamically or contextually with the game environment or players. Conventional methods often use separate models for facial and body animations, resulting in disjointed character representations that lack realism. Additionally, these traditional approaches are not optimized for different hardware systems, leading to performance constraints.

Innovative Approach

The proposed system addresses these limitations by employing a unified architecture that integrates facial and body animations through multimodal inputs. This approach allows NPCs to engage in complex interactions using text and audio inputs while displaying realistic animations based on contextual cues. A diffusion-based model is implemented to refine noisy data into high-fidelity animations, ensuring coherent character representation across various modalities. This method enhances computational efficiency and reduces memory operations, overcoming the inefficiencies of traditional NPC generation techniques.

Technical Implementation

The system utilizes advanced neural networks to enable dynamic emotion and motion guidance for NPCs. By fusing audio and text inputs, it generates expressive motion sequences and talking faces that adapt to environmental subtleties. An emotion-oriented contrastive language model powers the downstream animation models, allowing for realistic expressions driven by textual descriptions of emotions and actions. An SDK is provided for seamless integration into game engines, facilitating cross-environment adaptation of NPCs.

Applications and Benefits

These techniques have broad applications across various domains such as game character control, interactive assistants, video teleconferencing, metaverse environments, and entertainment. The processing system described includes parallel processors capable of executing machine learning algorithms necessary for these advanced animations. By enabling real-time rendering of complex NPC interactions across different platforms, this invention significantly enhances user engagement and experience in virtual digital environments.