Invention Title:

AI-BASED VISUAL CONTENT COLLAGE GENERATION

Publication number:

US20250265751

Publication date:

2025-08-21

Section:

Physics

Class:

G06T11/60

Inventors:

Sumithra BHAKTHAVATSALAM 🇺🇸 Kirkland, WA, United States

Gaurav Vinayak TENDOLKAR 🇺🇸 Reston, VA, United States

Mayura Vijayendra BISINEER 🇺🇸 Cupertino, CA, United States

Aryan SINGH 🇮🇳 Noida, India

Prashant GUPTA 🇮🇳 Auraiya, India

Akshiv BALUJA 🇮🇳 Delhi, India

Assignee:

Microsoft Technology Licensing, LLC 🇺🇸 Redmond, WA, United States

Applicant:

Microsoft Technology Licensing, LLC 🇺🇸 Redmond, WA, United States

Smart overview of the Invention

The patent application details a system for generating visual content collages using artificial intelligence. It addresses the challenges faced by users in current AI-based collage systems, where users manually select templates and adjust images. The proposed system automates these processes, enhancing user experience by using generative models to create collages seamlessly.

Technical Process

The process begins with users uploading images via a client device interface. Captions are generated for these images, which are then used to construct a prompt for a generative language model. This model extracts a theme from the captions. A second prompt is created using this theme to instruct a text-to-image model to generate a background image with placeholders.

Collage Creation

Once the background image is created, the system identifies placeholders within it. The uploaded images are then fitted into these placeholders, resulting in a collage image. This image is sent back to the client device and displayed on the user interface, providing an automated and streamlined collage creation experience.

System Components

The system utilizes a combination of deep learning, image processing, and post-processing techniques. It involves an image caption model, a large language model (LLM), and a large visual model (LVM), potentially forming a large multimodal model (LMM). These components work together to generate captions, extract themes, and create background images.

User Benefits

This approach reduces manual input from users, allowing them to generate creative collages by simply uploading images. The system infers contextual themes from captions, resulting in collages that better represent the uploaded images. This not only enhances user productivity but also optimizes computing resources during collage generation.