Invention Title:

UTILIZING A GENERATIVE NEURAL NETWORK TO INTERACTIVELY CREATE AND MODIFY DIGITAL IMAGES BASED ON NATURAL LANGUAGE FEEDBACK

Publication number:

US20250078200

Publication date:
Section:

Physics

Class:

G06T3/10

Inventors:

Applicant:

Smart overview of the Invention

The application introduces a system leveraging a generative neural network for creating and modifying digital images through natural language feedback. This framework supports an interactive, multi-round image generation process that aligns with user-provided text inputs. By conditioning the generative neural network with language feedback, the system can perform both text-to-image generation and guided image modification, ensuring that generated images have semantically meaningful features corresponding to the input text.

Background Challenges

Traditional image generation systems face several limitations, including inefficiency and inflexibility in handling natural language inputs. These systems often struggle with accurately interpreting complex language instructions, resulting in images that fail to meet user expectations. Additionally, conventional systems typically handle only single-round tasks and rely on predefined input sequences, limiting their adaptability to real-world scenarios where ongoing refinement is needed.

Innovative Approach

The proposed system employs a deep learning framework known as TiGAN, which facilitates interactive image generation using natural language feedback. It utilizes a CLIP model to integrate textual features into a unified embedding space, enabling the creation of style vectors that inform the image generation process. This approach ensures that images generated are consistent with the semantic content of the input text and can maintain these features through iterative refinement processes.

Image Manipulation and Training

The system also supports text-guided image manipulation, allowing users to modify generated images based on new textual descriptions. Using style transformation generators, it updates relevant dimensions of the style vector to reflect desired changes while preserving previous edits. The training process incorporates contrastive learning techniques to enhance the semantic alignment between generated images and textual feedback, improving the overall accuracy and efficiency of image processing.

Technical Advancements

Addressing the shortcomings of conventional systems, this interactive image generation framework improves both accuracy and flexibility. By using a pre-trained text encoder to create feature vectors for input text, it ensures that generated images accurately represent user commands. This method not only enhances the precision of image outputs but also optimizes processing efficiency by reducing necessary user interactions and computational resources.