Invention Title:

SYSTEMS AND METHODS FOR ENHANCED IMAGE GENERATION

Publication number:

US20250272808

Publication date:

2025-08-28

Section:

Physics

Class:

G06T5/77

Inventors:

Tamaki Kojima 🇯🇵 Tokyo, Japan

Siddharth Sagar NIJHAWAN 🇯🇵 Kanagawa, Japan

Takuya YASHIMA 🇯🇵 Kanagawa, Japan

Assignee:

Sony Group Corporation 🇯🇵 Tokyo, Japan

Applicant:

Sony Group Corporation 🇯🇵 Tokyo, Japan

Drawings (4 of 8)

Drawing 01 for SYSTEMS AND METHODS FOR ENHANCED IMAGE GENERATION

Drawing 02 for SYSTEMS AND METHODS FOR ENHANCED IMAGE GENERATION

Drawing 03 for SYSTEMS AND METHODS FOR ENHANCED IMAGE GENERATION

Drawing 04 for SYSTEMS AND METHODS FOR ENHANCED IMAGE GENERATION

Smart overview of the Invention

The patent application describes a system for enhanced image generation that manipulates images by transferring motion from one image to another while preserving the original appearance features. This system uses a three-dimensional (3D) flow field to perform spatial transformations, allowing for the creation of a warped image that mimics the target motion. The technology is particularly useful in applications such as face reenactment, animation, and video synthesis, where maintaining the visual identity of the source object is crucial.

Technological Background

Generative artificial intelligence (AI) techniques are commonly used in image synthesis to create realistic images. Traditional methods like image warping, style-based generative adversarial networks (GANs), and volumetric 3D head reconstruction each have limitations. Image warping struggles with pose variations, GANs often miss fine details, and volumetric methods can produce rigid results. The described system aims to overcome these challenges by integrating both 2D and 3D methodologies, enhancing facial expression transfer and head pose variation handling.

System Components

The image generation model includes several components: motion estimation, image warping, and image refinement. It utilizes adaptive instance normalization (AdaIN) for feature modulation and a U-shaped network (UNet)-based architecture for refining images. The model employs a cyclic warp loss technique to improve motion estimation accuracy, ensuring realistic rendering of facial details while preventing unwanted background motion.

Implementation Details

The system creates an image generation model through training that involves pre-processing, 3D warping, and image refinement stages.
In pre-processing, it separates the source foreground from the background and estimates 3DMM parameters for target motion.
The 3D warping stage computes flow fields to warp source features into target motion, transforming them into a 2D warped image.
The refinement stage uses a TransUNet architecture to enhance facial details while maintaining source identity.
The inpainting stage restores the refined foreground onto the background and fills any gaps using TransUNet again.

Training Process

The training process occurs in two phases. Initially, the 3D warping and inpainting networks are independently pre-trained to ensure effective deformation generation and background restoration. In the second phase, end-to-end training optimizes the entire model using cyclic warp loss for accurate motion estimation. This approach minimizes differences between synthesized images and source images, ensuring precise expression transfer and improved performance in challenging face reenactment tasks.