US20250095254
2025-03-20
Physics
G06T11/60
The patent application describes a method for prompt-based image editing, which automates the process of modifying digital images based on user instructions. This method involves accessing an image, selecting an area to edit, and providing an editing prompt. The image is transformed into visual noise to create a latent representation. A noise image is predicted from this latent and subtracted to refine it, producing an updated latent. This updated latent is combined with a noisy representation of the image to generate a masked latent, which applies edits only to the selected area.
The invention falls under systems and methods for automatic digital image editing using prompts. It aims to provide a more efficient alternative to manual editing, which can be labor-intensive and time-consuming. By using prompts, this method simplifies the editing process while maintaining high-quality results, addressing shortcomings of previous automated techniques that often yield unrealistic outcomes.
The method includes several steps: accessing an image, selecting an area for editing, receiving a prompt, and transforming the image into a latent form through visual noise. A noise image is predicted based on this latent and the prompt, which is then subtracted to refine the latent. The method generates a noisy representation of the image and combines it with the updated latent to create a masked latent. This masked latent applies the desired edits to specific areas of the image as indicated by the prompt.
The process involves repeating a denoising loop over multiple timesteps, where each loop predicts noise images and refines latents based on these predictions. The loop typically runs between 15 to 40 times, adjusting noise according to a schedule and using diffusion models trained with data sets of original and edited images. The denoising process ensures that the final output maintains quality and realism in edited images.
The invention can be implemented via non-transitory computer-readable storage media or computing devices that execute stored instructions. It supports localized edits indicated by prompts and can generate binary masks for precise editing areas. The system also allows for resolution adjustments and pixel melding processes to blend edits seamlessly into the original image. Results can be saved, displayed, or transmitted to external devices.