Invention Title:

FINE-GRAINED CONTROLLABLE VIDEO GENERATION

Publication number:

US20250166135

Publication date:
Section:

Physics

Class:

G06T5/60

Inventors:

Applicant:

Smart overview of the Invention

The patent application describes a system for generating videos using neural networks, which are conditioned on both text prompts and control inputs. This approach allows for the creation of videos that are more precise and relevant to user specifications. The method involves embedding text prompts and control inputs into a neural network to produce video frames over time.

Control Inputs

The system utilizes control inputs to enhance the video generation process. These inputs can include images or location data that define the appearance or movement of objects within the video. By integrating these additional data points, the system provides users with more control over the video's content, allowing for accurate representation of specified objects and actions.

Advantages

The described system offers several benefits over traditional text-to-video generation methods. It maintains temporal coherence across video scenes while aligning closely with both text prompts and control inputs. This approach reduces computational demands by minimizing the need for multiple candidate videos, thus optimizing processing power and memory usage.

Implementation

The system can be implemented through computer programs across multiple locations, utilizing graphical user interfaces (GUIs) on various electronic devices. Users interact with these interfaces via touch-sensitive displays or other input devices to provide text prompts and control inputs, which guide the video generation process.

User Interaction

Users can engage with the system through a GUI, using inputs such as drawing or dragging actions to specify object characteristics or movements within the video. These interactions allow for dynamic input that directly influences how objects are depicted in generated videos, offering a user-friendly approach to creating customized video content.