Invention Title:

METHOD OF GENERATING FULLBODY ANIMATABLE PERSON AVATAR FROM SINGLE IMAGE OF PERSON, COMPUTING DEVICE AND COMPUTER-READABLE MEDIUM IMPLEMENTING THE SAME

Publication number:

US20250209712

Publication date:
Section:

Physics

Class:

G06T13/40

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The patent describes a method for creating fullbody animatable avatars from a single image using advanced AI techniques. The process begins by obtaining an image of a person's body and a parametric body model that includes pose and shape parameters, as well as camera parameters used during the image capture. This model serves as the foundation for defining a texturing function, which maps each pixel in the image to corresponding texture coordinates in a texture space, including those parts not visible in the image. The RGB texture of the person is sampled based on this mapping, resulting in a map of sampled pixels.

Technical Methodology

The method involves passing the person's image through a trained encoder-generator network to generate the texture of the visible parts. This is followed by concatenating RGB textures, sampled pixel maps, and generated textures to form a neural texture. Unseen texture regions are inpainted using a diffusion-based model. The final step involves translating a rasterized image of the avatar into different poses using a trained neural renderer, leveraging modified parametric models to achieve realistic animations.

Innovative Aspects

Key innovations include the use of diffusion-based inpainting models and neural rendering to enhance photo-realism and animation fluidity. The methodology addresses challenges in rendering quality for unobserved body parts and simplifies avatar creation from single images without requiring video sequences. The approach integrates advanced AI architectures like StyleGAN2 for feature extraction and generation, enhancing the realism of generated avatars.

Technical Components

  • The encoder-generator network uses StyleGAN2 architecture for compressing images into feature vectors and generating textures.
  • A Denoising Diffusion Probabilistic Model (DDPM) with U-Net architecture is employed for inpainting unseen texture regions.
  • The neural renderer uses a pipeline based on ResNet blocks for rendering avatars from multi-view image sets.
  • Training involves multi-stage processes using loss functions like LPIPS and adversarial loss for optimizing network parameters.

Training Process

The training involves two stages: initially training the encoder-generator network and neural renderer with multi-view images to minimize loss functions, followed by integrating the diffusion-based inpainting model. The latter stage focuses on merging incomplete textures from various angles, adding Gaussian noise, and iteratively refining them through conditional diffusion learning. This comprehensive training ensures high-quality avatar generation with minimized errors and improved rendering consistency.