Invention Title:

Method and Apparatus for Generating Reenacted Image

Publication number:

US20250316112

Publication date:
Section:

Physics

Class:

G06V40/168

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The method for generating a reenacted image involves several key processes. Initially, landmarks are extracted from both a driver image and a target image. These landmarks help in capturing essential facial features. The method then generates a driver feature map using pose and expression details from the driver image's first face. Concurrently, a target feature map is created based on the style of the second face in the target image. A pose-normalized version of this target feature map is also generated to align the facial orientations.

Technical Process

The creation of the reenacted image involves merging data from both driver and target images. A mixed feature map is produced by integrating the driver feature map with the target feature map. This integration process ensures that the resulting reenacted image maintains the stylistic attributes of the target while adopting the pose and expression dynamics from the driver. The final step involves using this mixed feature map alongside the pose-normalized target feature map to generate the final reenacted image.

Apparatus Components

The apparatus designed for this method includes several components that work together to achieve image reenactment. A landmark transformer extracts facial landmarks, while two encoders generate necessary feature maps: one for the driver and one for the target. An image attention unit combines these maps into a mixed feature map, and a decoder then utilizes this data to produce the final reenacted image. This setup allows for efficient processing and transformation of facial features.

Background and Innovation

Traditional methods of facial landmark analysis often fail to differentiate between appearance and emotional characteristics, leading to inaccuracies such as misclassifying emotions based on static facial features like eyebrow height. The disclosed method addresses these limitations by incorporating distinct processing paths for pose, expression, and style information, thereby enhancing accuracy in emotion classification and other applications.

Implementation and Connectivity

The system implementing this method typically includes multiple terminals connected via a server over diverse communication networks such as LTE, Wi-Fi, or Bluetooth. This infrastructure supports seamless data exchange between devices, enabling operations like video calls or text transmissions that utilize reenacted images. The server facilitates these interactions by managing connections and relaying necessary data between terminals efficiently.