Invention Title:

UNSUPERVISED VOLUMETRIC ANIMATION

Publication number:

US20250356569

Publication date:

2025-11-20

Section:

Physics

Class:

G06T13/40

Inventors:

Kyle Olszewski 🇺🇸 Los Angeles, CA, United States

Menglei Chai 🇺🇸 Los Angeles, CA, United States

Sergey Tulyakov 🇺🇸 Santa Monica, CA, United States

Hsin-Ying Lee 🇺🇸 San Jose, CA, United States

Jian Ren 🇺🇸 Hermosa Beach, CA, United States

Aliaksandr Siarohin 🇺🇸 Los Angeles, CA, United States

Ivan Skorokhodov 🇺🇸 Los Angeles, CA, United States

Willi Menapace 🇺🇸 Santa Monica, CA, United States

Applicant:

Snap Inc. 🇺🇸 Santa Monica, CA, United States

Smart overview of the Invention

The unsupervised volumetric animation (UVA) technique focuses on creating 3D animations of non-rigid deformable objects using single-view RGB videos without requiring annotations. By leveraging a 3D autodecoder framework and a keypoint estimator through a differentiable perspective-n-point (PnP) algorithm, the model learns the object's 3D geometry and decomposes it into meaningful parts for animation. This approach allows for 3D segmentation, keypoint estimation, novel view synthesis, and animation from minimal input images or videos.

Technical Approach

The UVA system utilizes a canonical voxel generator to create a volumetric representation of non-rigid objects as sets of moving rigid parts. A 2D keypoint predictor estimates the pose of these parts in an image frame, while a volumetric skinning algorithm maps the canonical object volume into a deformed volume representing the object's current pose. This process enables rendering of the deformed object as an image, supporting both video and still image inputs.

Keypoint Prediction

The 2D keypoint predictor employs a convolutional neural network to detect 2D projections of moving parts, predicting keypoints that correspond to 3D keypoints in a canonical space. A differentiable PnP algorithm processes these predictions to recover each part's pose, introducing learnable canonical 3D keypoints shared among objects in a dataset. This method effectively bridges the gap between 2D observations and 3D representations.

Volumetric Rendering

The volumetric renderer uses deformed density and radiance data produced through volumetric skinning to render animation images. It considers the canonical density and radiance of objects, along with poses and Linear Blend Skinning (LBS) weights of moving parts, to generate realistic animations. This process supports dynamic rendering based on varying object poses.

Applications and Significance

UVA's capability to learn object geometry and decomposition in an unsupervised manner opens up applications in augmented reality (AR), virtual reality (VR), and social media. By providing a flexible framework that supports various object categories without extensive labeled data, UVA represents a significant advancement in creating animatable avatars from limited visual inputs, enhancing creative tasks across multiple domains.