Invention Title:

Systems and Methods for Video Generation via Language-Based Three-Dimensional Interactive Environment Construction

Publication number:

US20250218096

Publication date:

2025-07-03

Section:

Physics

Class:

G06T13/40

Inventors:

Amir Masoud Zarkesh 🇺🇸 Saratoga, CA, United States

Kamran Elahian 🇺🇸 Los Altos, CA, United States

Arash Keshmirian 🇺🇸 Martinez, CA, United States

Alex Zakharenkov 🇺🇸 Davis, CA, United States

Zohre Elahian 🇺🇸 Los Altos, CA, United States

Johnathan Valencia 🇺🇸 Fort Lauderdale, FL, United States

Applicant:

Polyup Inc. 🇺🇸 Saratoga, CA, United States

Drawings (4 of 35)

Drawing 01 for Systems and Methods for Video Generation via Language-Based Three-Dimensional Interactive Environment Construction

Drawing 02 for Systems and Methods for Video Generation via Language-Based Three-Dimensional Interactive Environment Construction

Drawing 03 for Systems and Methods for Video Generation via Language-Based Three-Dimensional Interactive Environment Construction

Drawing 04 for Systems and Methods for Video Generation via Language-Based Three-Dimensional Interactive Environment Construction

Smart overview of the Invention

The patent application introduces a system that allows users to create three-dimensional virtual environments using natural language descriptions. This system leverages artificial intelligence and large language models to interpret user input and generate a corresponding virtual environment. The generated environment includes various entities, such as avatars or objects, which can interact based on scripted behaviors and events. The ultimate goal is to render a video of this environment, offering an immersive experience that can be shared across different client machines.

Technological Context

This innovation is positioned within the fields of artificial intelligence and machine learning, focusing on creating and interacting with virtual environments. Traditional AI systems often struggle with unstructured contexts, such as dynamic 3D spaces. Current methods are typically rigid and specific, making them difficult to use in generating interactive virtual worlds. This patent seeks to address these limitations by providing more flexible AI techniques that can adapt to the complexities of virtual environment generation and navigation.

Key Components

The system comprises several key components: a communication interface for receiving natural language inputs, a path embedding generator for creating a path language representation of the virtual environment, and a video engine for rendering the final video. The path language representation includes entities and scripts that dictate their behavior within the environment. This representation is then used to animate the entities and present the environment on a client machine. Additionally, an agentic pipeline of generative language model agents refines this process, allowing for updates and enhancements based on user input.

User Interaction

Users interact with the system by providing natural language descriptions, which can include text or voice inputs as well as emojis. These inputs are processed to create detailed representations of the virtual environment's elements, such as entities, actions, and interactions. Users can influence the environment's dynamics by describing specific scenarios or behaviors they wish to see enacted. The system's ability to interpret emojis adds an extra layer of expressiveness, enabling more nuanced environmental characterization.

Advantages

This approach simplifies the creation of interactive 3D environments compared to traditional methods requiring specialized skills in game development or programming. By utilizing natural language processing, users can easily define complex scenarios without needing technical expertise. This democratizes access to virtual world creation tools, making it feasible for a broader audience to engage in creating detailed simulations or games with minimal effort.