Invention Title:

GENERATING IMAGES USING SEQUENCES OF GENERATIVE NEURAL NETWORKS

Publication number:

US20240249456

Publication date:

2024-07-25

Section:

Physics

Class:

G06T11/60

Inventors:

David James FLEET Toronto, Canada

Mohammad Norouzi Richmond Hill, Canada

William Chan Toronto, Canada

Chitwan Saharia Toronto, Canada

Jonathan Ho New York, NY, United States

Yi Li Oakville, Canada

Jay Ha Whang Austin, TX, United States

Saurabh Saxena Mississauga, Canada

Applicant:

Google LLC Mountain View, CA, United States

Drawings (4 of 20)

Drawing 01 for GENERATING IMAGES USING SEQUENCES OF GENERATIVE NEURAL NETWORKS

Drawing 02 for GENERATING IMAGES USING SEQUENCES OF GENERATIVE NEURAL NETWORKS

Drawing 03 for GENERATING IMAGES USING SEQUENCES OF GENERATIVE NEURAL NETWORKS

Drawing 04 for GENERATING IMAGES USING SEQUENCES OF GENERATIVE NEURAL NETWORKS

Smart overview of the Invention

Innovative methods and systems are presented for generating images using a sequence of generative neural networks. The process begins with receiving an input text prompt, which consists of a sequence of text tokens in natural language. This prompt is then processed by a text encoder neural network to produce contextual embeddings that capture the meaning of the text. These embeddings are subsequently fed into a series of generative neural networks, ultimately resulting in a final output image that visually represents the scene described by the input prompt.

Neural Network Processing Steps

The image generation involves multiple layers of generative neural networks, starting with an initial network that processes the contextual embeddings to create an initial image at a lower resolution. Subsequent networks take this initial output and further refine it, enhancing the resolution with each step. Each generative network operates based on both the contextual embeddings and the image produced by the previous network, ensuring a gradual improvement in image quality. This cascading approach allows for significant enhancements while addressing potential artifacts created during earlier stages.

Flexibility in Input Types

While the primary focus is on text prompts, the system is adaptable and can accept various types of conditioning inputs. These include noise samples from distributions, existing images, audio signals describing scenes, or combinations thereof. This versatility enables the generation of images from diverse data sources, making it applicable to numerous scenarios beyond just textual descriptions. The method's robust design ensures high-resolution outputs regardless of the input type.

Advantages of Cascading Generative Neural Networks

The described system boasts several advantages, particularly in producing high-resolution images that accurately reflect their textual descriptions. By employing a sequence of generative neural networks, it effectively reduces the computational burden associated with generating high-resolution images directly. This approach not only improves image quality but also mitigates common issues such as distortions and artifacts that may arise in lower-resolution outputs. Overall, this innovative method enhances both the quality and efficiency of image generation processes.