US20250191286
2025-06-12
Physics
G06T17/00
The patent application discusses a novel approach to generating three-dimensional (3D) content from text input, addressing the limitations of existing models. Traditional text-to-3D models require multiple views to be individually generated and optimized, which is computationally expensive and time-consuming. The innovation leverages a feed-forward neural network to generate 3D scenes from labeled voxels, significantly reducing the computational cost and enabling the creation of larger 3D scenes.
This technology falls under the domain of 3D content generation. Initially, machine learning models were designed to convert text descriptions into images. These models have evolved to support 3D content generation, but current solutions face challenges such as high computational demands and limited user control. The need for improved methods led to the development of a feed-forward neural network capable of generating comprehensive 3D scenes from voxel data.
The disclosed method involves a feed-forward neural network that processes labeled voxels to create a 3D representation of a scene. This representation is then used to generate two-dimensional (2D) images from various viewpoints. By utilizing style codes and pseudo-ground truth images, the network can produce realistic 2D images efficiently. This approach allows for rapid scene rendering, making it suitable for applications like architectural design and game development.
The method begins with inputting labeled voxels into a feed-forward neural network, which generates a 3D scene representation. Users can manually provide these voxel descriptions, assembling blocks to depict the scene. The network processes the data in a single step, ensuring efficiency. Once the 3D representation is created, it can be used to generate 2D images from specified viewpoints, offering flexibility in rendering different perspectives.
Potential applications include architectural design, where rapid prototyping of buildings or cities is beneficial, and game design, allowing easy scene construction by artists or players. The system comprises a neural network that processes voxel inputs and style codes to produce 3D representations. These are then rendered into 2D images using techniques like neural radiance field rendering. This method simplifies the complex workflow associated with traditional 3D design, broadening accessibility for users.