Invention Title:

VOXEL-TO-3D CONTENT GENERATOR

Publication number:

US20250191286

Publication date:

2025-06-12

Section:

Physics

Class:

G06T17/00

Inventors:

Ming-Yu Liu 🇺🇸 San Jose, CA, United States

Arun Mallya 🇺🇸 Mountain View, CA, United States

Zekun Hao 🇺🇸 Santa Clara, CA, United States

Applicant:

NVIDIA Corporation 🇺🇸 Santa Clara, CA, United States

Drawings (4 of 9)

Drawing 01 for VOXEL-TO-3D CONTENT GENERATOR

Drawing 02 for VOXEL-TO-3D CONTENT GENERATOR

Drawing 03 for VOXEL-TO-3D CONTENT GENERATOR

Drawing 04 for VOXEL-TO-3D CONTENT GENERATOR

Smart overview of the Invention

The patent application discusses a novel approach to generating three-dimensional (3D) content from text input, addressing the limitations of existing models. Traditional text-to-3D models require multiple views to be individually generated and optimized, which is computationally expensive and time-consuming. The innovation leverages a feed-forward neural network to generate 3D scenes from labeled voxels, significantly reducing the computational cost and enabling the creation of larger 3D scenes.

Technical Field and Background

This technology falls under the domain of 3D content generation. Initially, machine learning models were designed to convert text descriptions into images. These models have evolved to support 3D content generation, but current solutions face challenges such as high computational demands and limited user control. The need for improved methods led to the development of a feed-forward neural network capable of generating comprehensive 3D scenes from voxel data.

Summary of the Invention

The disclosed method involves a feed-forward neural network that processes labeled voxels to create a 3D representation of a scene. This representation is then used to generate two-dimensional (2D) images from various viewpoints. By utilizing style codes and pseudo-ground truth images, the network can produce realistic 2D images efficiently. This approach allows for rapid scene rendering, making it suitable for applications like architectural design and game development.

Detailed Process Description

The method begins with inputting labeled voxels into a feed-forward neural network, which generates a 3D scene representation. Users can manually provide these voxel descriptions, assembling blocks to depict the scene. The network processes the data in a single step, ensuring efficiency. Once the 3D representation is created, it can be used to generate 2D images from specified viewpoints, offering flexibility in rendering different perspectives.

Applications and System Implementation

Potential applications include architectural design, where rapid prototyping of buildings or cities is beneficial, and game design, allowing easy scene construction by artists or players. The system comprises a neural network that processes voxel inputs and style codes to produce 3D representations. These are then rendered into 2D images using techniques like neural radiance field rendering. This method simplifies the complex workflow associated with traditional 3D design, broadening accessibility for users.