US20250061650
2025-02-20
Physics
G06T17/00
The patent application details an advanced image processing system designed to integrate three-dimensional (3D) models with text prompts to produce detailed images. This system utilizes a depth map derived from the 3D model and combines it with a textual description to generate an output image. The approach is particularly aimed at creating visually accurate and aesthetically pleasing images by merging geometric data with descriptive text inputs.
Image processing, especially digital image processing, involves using computers to edit or synthesize images through algorithms or processing networks. The field has grown significantly, impacting areas like photography, video processing, and computer vision. Traditional text-to-image models often struggle to accurately depict scenes with specific 3D characteristics due to the limitations of conveying visual details solely through text.
The disclosed system offers a novel method for generating images by combining 3D scene geometry with user-provided text prompts. The process involves creating a depth map from the 3D model and using it alongside the text input in an image generation model. This method ensures that the output image accurately reflects the user's intended scene, leveraging both geometric precision and textual guidance.
The system employs machine learning techniques, specifically artificial neural networks (ANNs), to refine image outputs. These networks adjust weights during training to enhance accuracy, allowing nodes to process and transmit signals effectively. This adaptive learning process ensures that the generated images align closely with both the geometric input from 3D models and the descriptive input from text prompts.