US20250209712
2025-06-26
Physics
G06T13/40
The patent describes a method for creating fullbody animatable avatars from a single image using advanced AI techniques. The process begins by obtaining an image of a person's body and a parametric body model that includes pose and shape parameters, as well as camera parameters used during the image capture. This model serves as the foundation for defining a texturing function, which maps each pixel in the image to corresponding texture coordinates in a texture space, including those parts not visible in the image. The RGB texture of the person is sampled based on this mapping, resulting in a map of sampled pixels.
The method involves passing the person's image through a trained encoder-generator network to generate the texture of the visible parts. This is followed by concatenating RGB textures, sampled pixel maps, and generated textures to form a neural texture. Unseen texture regions are inpainted using a diffusion-based model. The final step involves translating a rasterized image of the avatar into different poses using a trained neural renderer, leveraging modified parametric models to achieve realistic animations.
Key innovations include the use of diffusion-based inpainting models and neural rendering to enhance photo-realism and animation fluidity. The methodology addresses challenges in rendering quality for unobserved body parts and simplifies avatar creation from single images without requiring video sequences. The approach integrates advanced AI architectures like StyleGAN2 for feature extraction and generation, enhancing the realism of generated avatars.
The training involves two stages: initially training the encoder-generator network and neural renderer with multi-view images to minimize loss functions, followed by integrating the diffusion-based inpainting model. The latter stage focuses on merging incomplete textures from various angles, adding Gaussian noise, and iteratively refining them through conditional diffusion learning. This comprehensive training ensures high-quality avatar generation with minimized errors and improved rendering consistency.