US20250239003
2025-07-24
Physics
G06T15/20
The patent application describes a method and system for generating three-dimensional (3D) objects using a multi-view stereo (MVS) neural reconstruction network. This process involves creating a feature volume from images taken from multiple viewpoints of a subject and refining this volume through score distillation sampling (SDS) fine-tuning to produce a detailed 3D object. The system can also receive text prompts from users and generate images of the subject from various perspectives using a multi-view diffusion model.
This disclosure pertains to the field of 3D object generation, particularly from text inputs and two-dimensional (2D) media. The growing popularity of image generation technologies, both open-source and proprietary, has increased the demand for converting 2D media into 3D models. These technologies allow for the creation of images that match natural language descriptors, thanks to extensive training on diverse datasets.
The method involves generating a feature volume using an MVS neural reconstruction network from images that capture different viewpoints of a subject. SDS fine-tuning is then applied to this feature volume to create a 3D object. A non-volatile computer-readable medium can store instructions for executing these operations. The system includes a 3D object engine capable of generating images from text prompts, constructing feature volumes, and refining them into 3D models.
The accompanying drawings illustrate various embodiments of the systems and methods described. These include schematic diagrams of system configurations, process flows for generating 3D objects, and structural diagrams of computer systems involved in the process. The diagrams emphasize principles over scale, offering visual context for understanding the described embodiments.
The detailed description elaborates on the components and processes involved in the invention. It discusses how functional blocks may be realized through hardware or software components designed to perform specified functions. The description also covers concepts like diffusion models, which generate new data by reversing noise processes, and multilayer perceptrons (MLP), which are used in rendering feature volumes into final outputs.