Invention Title:

GENERATING 3D MODELS FROM A SINGLE IMAGE

Publication number:

US20250078393

Publication date:

2025-03-06

Section:

Physics

Class:

G06T15/205

Inventors:

Kai Zhang Sunnyvale, CA, United States

TRUNG HUU BUI San Jose, CA, United States

Feng Liu Beaverton, OR, United States

Jiuxiang Gu Baltimore, MD, United States

Sai Bi San Jose, CA, United States

Hao Tan Santa Clara, CA, United States

Difan Liu San Jose, CA, United States

YICONG HONG Bruce, Australia

YANG ZHOU Mountain View, CA, United States

KALYAN K. SUNKAVALLI Saratoga, CA, United States

Applicant:

Adobe Inc. San Jose, CA, United States

Smart overview of the Invention

Systems and methods are presented for creating a three-dimensional (3D) model from a single input image. These methods involve obtaining an input image along with its camera view information, encoding the image to derive two-dimensional (2D) features, and then decoding these features to extract 3D information. The extracted 3D features are used to generate a 3D model, providing a robust solution for transforming flat images into 3D representations.

Background

The field of image processing and computer vision is crucial for enabling machines to interpret visual data in a way similar to human perception. Traditional methods like stereopsis and photogrammetry require multiple images from different perspectives to reconstruct 3D structures. However, these techniques depend heavily on having several coherent images, which can be limiting in applications where only a single image is available.

Methodology

The described system leverages a machine learning model that encodes an input image to extract 2D features, which are then combined with camera and positional data. This combination allows the model to decode the features into 3D tokens, which are further processed to generate a 3D model. The use of a neural radiance field (NeRF) representation allows for detailed and accurate 3D reconstructions from just one image.

Implementation

The apparatus includes components like an image encoder, feature decoder, and a 3D model generator. The encoder uses transformer architecture with self-attention layers, while the decoder incorporates cross-attention layers. The 3D model generator may utilize NeRF technology to produce comprehensive 3D models. This system is designed to operate on servers capable of handling complex computations and data exchanges over networks.

Applications

This technology allows users to upload or select an image through an interface, whereupon the system generates a 3D model of the depicted object. The generated models can be viewed from various angles and downloaded as mesh files for further use. This capability is particularly useful in fields like virtual reality, gaming, and digital content creation where realistic 3D models are crucial.