US20240404174
2024-12-05
Physics
G06T15/08
Systems and methods are introduced for animating a source portrait image using motion such as pose and expression from a target image. Unlike traditional systems, this approach constructs an implicit 3D head avatar from a single-view portrait image, capturing photo-realistic details both within and beyond the face region. This avatar is immediately available for animation without requiring further optimization during inference.
The system utilizes three processing branches to produce tri-planes that represent the coarse 3D geometry of the head avatar, the detailed appearance of the source image, and the expression of the target image. By combining these tri-planes through volumetric rendering, an image with the desired identity, expression, and pose is generated. Once trained, this system allows for efficient 3D head avatar reconstruction and animation with a single forward processing pass.
The method involves receiving a source image depicting a subject with a specific expression and extracting a 3D shape to create an identity surface associated with that subject. The original expression can be replaced with another, such as a neutral one. The source image is processed to compute an appearance representation, which is then integrated with the identity surface to generate an output image featuring the new expression.
The framework includes three neural branches that handle coarse geometry reconstruction, detailed appearance capture, and expression modification. A canonical branch reconstructs coarse 3D geometry in a volumetric format with a neutral expression. An appearance branch maps pixel values from the portrait image onto corresponding positions in 3D space. An expression branch modifies the head avatar's expression using a 3D morphable model.
This system is applicable to video conferencing, computer games, VR, and AR by synthesizing realistic portrait images matching given identities and motions efficiently and with high fidelity. It generalizes well to unseen identities without test-time optimization. The system performs 3D reconstruction and animation while capturing intricate details in portrait images and generalizing across various identities.