Invention Title:

EDITING DIGITAL IMAGES BASED ON MASKS

Publication number:

US20250384601

Publication date:

2025-12-18

Section:

Physics

Class:

G06T11/60

Inventors:

Jing Shi 🇺🇸 San Jose, CA, United States

TRUNG HUU BUI 🇺🇸 San Jose, CA, United States

Zijun Wei 🇺🇸 San Jose, CA, United States

Handong Zhao 🇺🇸 Cupertino, CA, United States

Chongyang Gao 🇺🇸 Evanston, IL, United States

Assignee:

Adobe Inc. 🇺🇸 San Jose, CA, United States

Applicant:

Adobe Inc. 🇺🇸 San Jose, CA, United States

Smart overview of the Invention

The described techniques involve an image combination system that enhances digital image editing by using masks and machine learning models. The system receives two digital images, a prompt for editing, and a mask indicating a portion of the second image. A machine learning model, such as a diffusion model, identifies features from the masked portion of the second image to incorporate into the first image, creating an edited version. This edited image is then displayed to the user through a user interface.

Traditional digital image editing techniques often face challenges like visual inaccuracies and inefficiencies, primarily due to their reliance on extensive text-based command training. These methods require significant computational resources and can struggle with precision when users provide vague text inputs. The new approach described leverages masks to guide the editing process, bypassing some of these limitations by allowing direct specification of features to be transferred between images.

The image combination system uses a diffusion model to determine how to merge features from the reference image into the source image. This model analyzes attentions derived from the input, such as a prompt and mask, to decide which features to incorporate or retain. The process involves replacing vectors associated with features like texture or shape from the source image with those from the reference image, based on the prompt's guidance.

An example scenario illustrates the process: a user provides a source image of a dog, a reference image of a cat, a prompt like "dog with cat's tail," and a mask over the cat's tail. The diffusion model identifies the cat's tail as the feature to incorporate, modifying the dog's image accordingly. This method allows for precise editing without extensive training on text commands, as the mask directly specifies the feature boundary.

This approach offers several advantages over conventional methods. By using masks, it simplifies the task of specifying features, reducing the need for complex text descriptions and making the process faster and less resource-intensive. The ability to easily draw a mask over a feature ensures accuracy and customization, enhancing the overall efficiency and quality of digital image editing.