Invention Title:

SYSTEMS AND METHODS FOR IMAGE GENERATION WITH MACHINE LEARNING MODELS

Publication number:

US20250078353

Publication date:

2025-03-06

Section:

Physics

Class:

G06T11/60

Inventors:

Aditya RAMESH San Francisco, CA, United States

Prafulla DHARIWAL San Francisco, CA, United States

Alexander NICHOL San Francisco, CA, United States

Assignee:

OpenAI Opco, LLC San Francisco, CA, United States

Applicant:

OpenAI Opco, LLC San Francisco, CA, United States

Smart overview of the Invention

The disclosed systems and methods focus on enhancing images using machine learning models based on text inputs. This involves accessing a digital input image and creating a masked image by removing a specified region. The enhancement process relies on a machine learning model which can regenerate the masked area by incorporating the text input provided, resulting in an enhanced image that aligns with the given prompt.

Technical Challenges

Image generation using AI and machine learning is complex due to the need for algorithms that can accurately replicate real-world nuances like lighting, texture, and perspective. Additionally, expanding images beyond their original borders while maintaining semantic integrity poses significant computational challenges. Existing systems often struggle with these tasks, particularly when maintaining photorealism and efficiency across various styles.

Proposed Solutions

The invention offers technological improvements to overcome these challenges. It includes a non-transitory computer-readable medium with instructions for regenerating image regions using text inputs. This involves generating a masked image and providing it, along with text prompts, to a machine learning model trained on a diverse set of images. The model generates enhanced images by replicating pixel values and integrating new segments based on the text input.

System Capabilities

The system can manage various inputs, such as the input image, masked region, and text prompt, to generate enhanced images. It supports user interaction through a graphical user interface where users can specify masked regions or provide text inputs. The model can extend image dimensions by replacing masked regions with newly generated segments, thus offering flexibility in altering both vertical and horizontal dimensions.

Implementation Details

The implementation involves training the machine learning model using deep learning techniques alongside large language models. It includes sub-models for generating embeddings and enhanced images. The system's architecture facilitates efficient image enhancement by utilizing overlapping techniques for masking regions and generating extensions based on provided prompts, thereby ensuring high-quality output that adheres to user specifications.