US20240420389
2024-12-19
Physics
G06T11/20
The patent application describes systems and methods for creating tile-able patterns from text prompts using generative AI models. This involves converting a text prompt into a latent vector via a trained generation prior model, which is specifically designed to work within the distribution of tile-able patterns. The latent vector is then used by an image generation model to produce an output image that includes elements from the text prompt and can be seamlessly repeated.
Recent advancements in generative AI have significantly impacted fields such as digital art and design by enabling text-to-image generation. Traditional models like Generative Adversarial Networks (GANs) and newer diffusion models, such as Denoising Diffusion Probabilistic Models (DDPMs), have been used for this purpose. However, controlling the style and aesthetics of generated images using text alone can be challenging, especially when creating complex, seamless patterns.
The disclosed method involves encoding a text prompt to generate a prompt embedding that is processed by a generation prior model to produce a latent vector. This vector is sampled from a learned cluster in an embedding space that represents tile-able images. An image generation model then utilizes this latent vector to create the final image, incorporating a circular convolution operation to ensure seamless repetition.
Conventional generative models often fail to produce repeatable patterns without visible seams. The described system addresses these limitations by using an improved approach that ensures the edges of the generated patterns align perfectly. This allows users to generate detailed and seamlessly tile-able patterns quickly, enhancing creative workflows across various applications.
The system comprises at least one processor and memory storing instructions for executing the process. It includes both a generation prior model and an image generation model, possibly utilizing diffusion models for enhanced performance. A text encoder may be used to convert text prompts into embeddings, while a binary classifier can filter datasets for training purposes. The system can be interfaced via user input, producing tile-able patterns based on specified prompts.