Invention Title:

INTERACTIVE THREE-DIMENSION AWARE TEXT-TO-IMAGE GENERATION

Publication number:

US20250061650

Publication date:

2025-02-20

Section:

Physics

Class:

G06T17/00

Inventors:

Radomir Mech Mountain View, CA, United States

Giorgio Gori San Jose, CA, United States

Matheus Gadelha San Jose, CA, United States

Tomasz Opasinski Los Angeles, CA, United States

Kevin James Blackburn-Matzen Seattle, WA, United States

Mathieu Kevin Pascal Gaillard San Jose, CA, United States

Applicant:

Adobe Inc. San Jose, CA, United States

Smart overview of the Invention

The patent application details an advanced image processing system designed to integrate three-dimensional (3D) models with text prompts to produce detailed images. This system utilizes a depth map derived from the 3D model and combines it with a textual description to generate an output image. The approach is particularly aimed at creating visually accurate and aesthetically pleasing images by merging geometric data with descriptive text inputs.

Background

Image processing, especially digital image processing, involves using computers to edit or synthesize images through algorithms or processing networks. The field has grown significantly, impacting areas like photography, video processing, and computer vision. Traditional text-to-image models often struggle to accurately depict scenes with specific 3D characteristics due to the limitations of conveying visual details solely through text.

System Description

The disclosed system offers a novel method for generating images by combining 3D scene geometry with user-provided text prompts. The process involves creating a depth map from the 3D model and using it alongside the text input in an image generation model. This method ensures that the output image accurately reflects the user's intended scene, leveraging both geometric precision and textual guidance.

Implementation

Components: The system comprises elements like a user device, server, cloud infrastructure, and databases.
User Interaction: Users can manipulate 3D models using a modeling application equipped with tools for rotating, scaling, and translating objects.
Image Generation: By rendering geometry as depth maps and using text descriptions, the system generates images that adhere closely to user-defined shapes and textures.

Technical Details

The system employs machine learning techniques, specifically artificial neural networks (ANNs), to refine image outputs. These networks adjust weights during training to enhance accuracy, allowing nodes to process and transmit signals effectively. This adaptive learning process ensures that the generated images align closely with both the geometric input from 3D models and the descriptive input from text prompts.