Invention Title:

GENERATIVE PIPELINE USING SINGLE SELFIE INPUT

Publication number:

US20250238902

Publication date:

2025-07-24

Section:

Physics

Class:

G06T5/60

Inventors:

Ma'ayan Mishin Shuvi 🇮🇱 Givatayim, Israel

Daniil Ostashev 🇬🇧 London, United Kingdom

Aleksandr Belskikh 🇬🇧 London, United Kingdom

Konstantin Gudkov 🇺🇸 Playa Vista, CA, United States

Igor Filippov 🇳🇱 Amsterdam, Netherlands

Lucas Anton Christoph Deecke 🇬🇧 London, United Kingdom

Dmitrii Smoliakov Dubai, United Arab Emirates

Applicant:

Snap Inc. 🇺🇸 Santa Monica, CA, United States

Drawings (4 of 14)

Drawing 01 for GENERATIVE PIPELINE USING SINGLE SELFIE INPUT

Drawing 02 for GENERATIVE PIPELINE USING SINGLE SELFIE INPUT

Drawing 03 for GENERATIVE PIPELINE USING SINGLE SELFIE INPUT

Drawing 04 for GENERATIVE PIPELINE USING SINGLE SELFIE INPUT

Smart overview of the Invention

The technology described processes a selfie input to create augmented reality content using a generative machine learning pipeline. It leverages neural networks and diffusion models to transform the selfie into a latent identity representation. This representation is then combined with a text condition and a pose template to generate an intermediate image. The image undergoes further enhancement and restoration to produce a final output image, which is displayed on a client device.

Background

In recent years, digital images have become integral to daily life due to the widespread availability of portable devices, increased storage capacity, and improved network connectivity. These advancements have enabled users worldwide to capture and share images easily. However, processing these images, especially under varying conditions like lighting or movement, poses significant computational challenges.

Detailed Description

The system enhances user experiences by enabling devices to perform complex image processing tasks efficiently. It uses advanced machine learning techniques to generate augmented reality content from selfies. The process involves transforming the input selfie into a latent identity representation using neural networks and combining it with text and pose data through diffusion models to create enriched media content.

Practical Applications

This technology is particularly beneficial for messaging systems on mobile devices, which are often limited by power and resources. By optimizing image processing, the system reduces latency and power consumption, making it feasible for real-time applications. The infrastructure supports creating and sharing interactive media that includes 3D content or AR effects, enhancing user interaction through various messaging platforms.

Networked Environment

The system operates within a networked environment where interaction clients on user devices communicate with server systems over the Internet. These interactions involve exchanging multimedia data and commands, facilitated by APIs that connect clients with server-side functionalities. The architecture supports various operations like media augmentation and overlays, providing users with dynamic and engaging augmented reality experiences.