US20250265751
2025-08-21
Physics
G06T11/60
The patent application details a system for generating visual content collages using artificial intelligence. It addresses the challenges faced by users in current AI-based collage systems, where users manually select templates and adjust images. The proposed system automates these processes, enhancing user experience by using generative models to create collages seamlessly.
The process begins with users uploading images via a client device interface. Captions are generated for these images, which are then used to construct a prompt for a generative language model. This model extracts a theme from the captions. A second prompt is created using this theme to instruct a text-to-image model to generate a background image with placeholders.
Once the background image is created, the system identifies placeholders within it. The uploaded images are then fitted into these placeholders, resulting in a collage image. This image is sent back to the client device and displayed on the user interface, providing an automated and streamlined collage creation experience.
The system utilizes a combination of deep learning, image processing, and post-processing techniques. It involves an image caption model, a large language model (LLM), and a large visual model (LVM), potentially forming a large multimodal model (LMM). These components work together to generate captions, extract themes, and create background images.
This approach reduces manual input from users, allowing them to generate creative collages by simply uploading images. The system infers contextual themes from captions, resulting in collages that better represent the uploaded images. This not only enhances user productivity but also optimizes computing resources during collage generation.