Invention Title:

NEURAL HEAD AVATAR CONSTRUCTION FROM AN IMAGE

Publication number:

US20240404174

Publication date:

2024-12-05

Section:

Physics

Class:

G06T15/08

Inventors:

Jan Kautz Lexington, MA, United States

Shalini De Mello San Francisco, CA, United States

Sifei Liu Santa Clara, CA, United States

Umar Iqbal San Jose, CA, United States

Xueting Li Sunnyvale, CA, United States

Koki Nagano Playa Vista, CA, United States

Applicant:

NVIDIA Corporation Santa Clara, CA, United States

Drawings (4 of 18)

Smart overview of the Invention

Systems and methods are introduced for animating a source portrait image using motion such as pose and expression from a target image. Unlike traditional systems, this approach constructs an implicit 3D head avatar from a single-view portrait image, capturing photo-realistic details both within and beyond the face region. This avatar is immediately available for animation without requiring further optimization during inference.

Technical Approach

The system utilizes three processing branches to produce tri-planes that represent the coarse 3D geometry of the head avatar, the detailed appearance of the source image, and the expression of the target image. By combining these tri-planes through volumetric rendering, an image with the desired identity, expression, and pose is generated. Once trained, this system allows for efficient 3D head avatar reconstruction and animation with a single forward processing pass.

Implementation Details

The method involves receiving a source image depicting a subject with a specific expression and extracting a 3D shape to create an identity surface associated with that subject. The original expression can be replaced with another, such as a neutral one. The source image is processed to compute an appearance representation, which is then integrated with the identity surface to generate an output image featuring the new expression.

System Architecture

The framework includes three neural branches that handle coarse geometry reconstruction, detailed appearance capture, and expression modification. A canonical branch reconstructs coarse 3D geometry in a volumetric format with a neutral expression. An appearance branch maps pixel values from the portrait image onto corresponding positions in 3D space. An expression branch modifies the head avatar's expression using a 3D morphable model.

Applications and Flexibility

This system is applicable to video conferencing, computer games, VR, and AR by synthesizing realistic portrait images matching given identities and motions efficiently and with high fidelity. It generalizes well to unseen identities without test-time optimization. The system performs 3D reconstruction and animation while capturing intricate details in portrait images and generalizing across various identities.