Invention Title:

METHOD AND APPARATUS WITH VISUAL MEDIUM GENERATION

Publication number:

US20250356464

Publication date:
Section:

Physics

Class:

G06T5/60

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The invention pertains to a processor-implemented method and apparatus for generating visual media. It involves obtaining prompts that specify various levels of image quality and using these prompts to create multiple visual media of the same content. The process employs a visual medium generation model trained with a loss function that evaluates the difference between the output's image quality and the quality level specified by the prompt.

Technological Context

Visual medium generation technology leverages computer systems to create images or videos, applicable across numerous fields. This encompasses machine learning models such as generative adversarial networks (GANs), transformer-based models, and diffusion models. These technologies are integral in 2D and 3D modeling, rendering, animation, and other computer graphics applications, providing essential data for training visual medium-related models.

Methodology

The method involves fine-tuning a pre-trained generative model using a loss function that measures the discrepancy between the generated media's quality and the prompt-specified quality. Prompts can contain information about multiple image quality elements, allowing for varied quality levels across different prompts. Training data is generated by applying these prompts to produce visual media, which is then used to train a visual medium-based model.

Model Training

Training of the visual medium-based model includes using prompts to generate ground truth (GT) data for image quality evaluation. The model can generate comparative evaluation data or improve the image quality of inputs based on prompts. An image quality improvement prompt can transform media from one quality level to another, determined by comparing two prompts' relative superiority.

Apparatus Configuration

The apparatus comprises processors configured to handle prompts and generate corresponding visual media. These processors fine-tune generative models using a loss function that assesses quality differences between outputs and input prompts. They also generate training data for visual medium-based models, facilitating various evaluations and improvements in image quality.