US20250078353
2025-03-06
Physics
G06T11/60
The disclosed systems and methods focus on enhancing images using machine learning models based on text inputs. This involves accessing a digital input image and creating a masked image by removing a specified region. The enhancement process relies on a machine learning model which can regenerate the masked area by incorporating the text input provided, resulting in an enhanced image that aligns with the given prompt.
Image generation using AI and machine learning is complex due to the need for algorithms that can accurately replicate real-world nuances like lighting, texture, and perspective. Additionally, expanding images beyond their original borders while maintaining semantic integrity poses significant computational challenges. Existing systems often struggle with these tasks, particularly when maintaining photorealism and efficiency across various styles.
The invention offers technological improvements to overcome these challenges. It includes a non-transitory computer-readable medium with instructions for regenerating image regions using text inputs. This involves generating a masked image and providing it, along with text prompts, to a machine learning model trained on a diverse set of images. The model generates enhanced images by replicating pixel values and integrating new segments based on the text input.
The system can manage various inputs, such as the input image, masked region, and text prompt, to generate enhanced images. It supports user interaction through a graphical user interface where users can specify masked regions or provide text inputs. The model can extend image dimensions by replacing masked regions with newly generated segments, thus offering flexibility in altering both vertical and horizontal dimensions.
The implementation involves training the machine learning model using deep learning techniques alongside large language models. It includes sub-models for generating embeddings and enhanced images. The system's architecture facilitates efficient image enhancement by utilizing overlapping techniques for masking regions and generating extensions based on provided prompts, thereby ensuring high-quality output that adheres to user specifications.