Invention Title:

MULTIMODAL CONTEXTUALIZER FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

Publication number:

US20240428494

Publication date:
Section:

Physics

Class:

G06T13/40

Inventors:

Applicant:

Smart overview of the Invention

The patent application introduces advanced systems and techniques for generating and animating non-player characters (NPCs) in virtual digital environments. These systems utilize multimodal input data, which includes various input types for interacting with NPCs that have distinct body and facial features. The process involves using neural networks to create animation sequences by disentangling the input data into latent representations. These are then combined with the input data to generate speech through a large-language model (LLM). Further processing with reverse diffusion techniques yields detailed face vertex displacement and joint trajectory data, ultimately leading to realistic NPC animations.

Background

NPCs play a vital role in video games and virtual environments, offering narrative depth and enhancing user experience. Traditionally, NPCs are controlled by scripts and pre-defined animations, limiting their interaction capabilities. They often serve as guides or adversaries within games but lack the ability to engage dynamically with players or other virtual characters in real-time. Conventional methods also struggle with unifying facial and body animations, leading to performance inefficiencies across different hardware systems.

Technical Challenges

Existing approaches to NPC generation often involve separate processing of facial and body animations, which poses challenges in achieving cohesive character representations. Traditional methods are typically unimodal or bimodal, failing to capture the nuances necessary for realistic animation. Moreover, these methods are computationally expensive and not optimized for cross-environment adaptation, making it difficult to integrate NPCs into different game engines efficiently.

Innovative Techniques

The described techniques employ a unified AI architecture that integrates convolutional neural networks and diffusion-based models for generating realistic NPC animations. This approach allows for dynamic emotion and motion guidance through multimodal inputs like text and audio. The system addresses the disentangling problem by establishing an implicit relationship between facial and body animations, enhancing the realism of NPC interactions within complex environments like metaverses and multiplayer games.

Applications

The patent outlines an NPC Software Development Kit (SDK) that can be integrated into various game engines and virtual environments. This SDK uses a unified AI architecture to generate synchronized body and face animations based on multimodal inputs. The techniques described have broad applications, including game character control, interactive assistants, video teleconferencing, metaverse environments, and entertainment. By leveraging generative AI architectures, the system improves computational efficiency while enabling realistic and contextually aware NPC behaviors across diverse platforms.