US20250157093
2025-05-15
Physics
G06T11/00
Systems and methods are provided for generating self-consistent synthetic imagery based on a textual prompt. These approaches tackle the challenge of producing consistent character images across different contexts, a common limitation in existing text-to-image generative models. This innovation is particularly beneficial in creative fields such as book illustration, brand crafting, comic creation, presentation development, and webpage design where visual consistency is essential.
Current techniques often depend on multiple pre-existing images or involve labor-intensive manual processes, struggling to maintain consistent character identity, especially for novel or imaginary characters. These methods typically do not generalize well to new characters due to being trained on specific datasets. This limitation restricts their versatility and applicability.
The disclosure presents a fully automated solution for consistent character generation using only a text prompt. Implementations iteratively customize a pre-trained text-to-image model with images generated by the model itself as training data. Images are clustered based on visual cohesion, and the model is trained on these selected images until a convergence metric is satisfied, enhancing character identity consistency.
This method does not require pre-existing images, allowing for the generation of consistent and diverse images from just a text description. It is highly versatile and applicable to a wide range of characters and contexts, being fully automated and domain-agnostic. This eliminates the need for manual interventions or ad hoc solutions.
A computer-implemented method generates self-consistent synthetic imagery by processing a textual prompt with a machine-learned image generation model through multiple update iterations. In each iteration, synthetic images are generated, and visually cohesive subsets are selected for training. Cohesion is assessed using embeddings in a latent space, clustering them to evaluate cohesion measures, ensuring training focuses on the most consistent images.