Invention Title:

PERSONALIZED TEXT-TO-IMAGE DIFFUSION MODEL

Publication number:

US20240296596

Publication date:
Section:

Physics

Class:

G06T11/00

Inventors:

Applicant:

Drawings (4 of 6)

Smart overview of the Invention

A personalized text-to-image diffusion model utilizes methods and systems to train a model that generates images based on text inputs. The model is designed to produce variable instances of an object class when a general input is provided and to generate specific subject instances when a unique identifier is included in the text input. This capability allows for both general and precise image generation based on user-defined criteria.

Training Process

The training involves creating a custom image dataset from various images that depict specific subject instances. Each image is analyzed to determine its object class, and a sequence of tokens is generated based on predetermined frequencies. These tokens help create unique identifiers for each subject instance, which are then used to train the text-to-image model, adjusting its parameters to enhance its ability to generate the desired images based on the input provided.

Model Architecture

The text-to-image model operates as a diffusion model, which consists of a low-resolution diffusion component and a super-resolution component. The training process includes generating outputs from noise conditioned on both object class and unique identifiers, refining the model's capabilities through iterative updates based on loss functions that measure differences between generated images and actual images in the dataset.

Advantages

  • Increased value of image data through unique identifier generation for different instances.
  • Higher quality image outputs by optimizing reconstruction and prior preservation losses.
  • Ability to produce diverse images across various contexts, poses, and lighting conditions.

Technological Applications

The trained model can be applied in various technological scenarios, including generating realistic images based on user instructions, creating artistic renditions, and customizing existing objects. The system's efficiency allows it to operate effectively with smaller training sets while minimizing processing resources, making it suitable for extensive applications across different domains.