Invention Title:

ROBUST FACIAL ANIMATION FROM VIDEO USING NEURAL NETWORKS

Publication number:

US20240355028

Publication date:

2024-10-24

Section:

Physics

Class:

G06T13/40

Inventors:

Kiran BHAT San Francisco, CA, United States

Will Welch San Francisco, CA, United States

Vivek Verma Oakland, CA, United States

Ian Sachs Corte Madera, CA, United States

Inaki NAVARRO Zurich, Switzerland

Dario KNEUBUHLER Zurich, Switzerland

Tijmen VERHULSDONCK Göteborg, Sweden

Eloi DU BOIS Austin, TX, United States

Assignee:

Roblox Corporation San Mateo, CA, United States

Applicant:

Roblox Corporation San Mateo, CA, United States

Drawings (4 of 9)

Drawing 01 for ROBUST FACIAL ANIMATION FROM VIDEO USING NEURAL NETWORKS

Drawing 02 for ROBUST FACIAL ANIMATION FROM VIDEO USING NEURAL NETWORKS

Drawing 03 for ROBUST FACIAL ANIMATION FROM VIDEO USING NEURAL NETWORKS

Drawing 04 for ROBUST FACIAL ANIMATION FROM VIDEO USING NEURAL NETWORKS

Smart overview of the Invention

The patent application details a system for generating real-time facial animations for 3D avatars using video input. This process involves capturing video of a user's face with a camera and utilizing neural networks to translate facial movements into animations. The system intelligently selects animation detail levels based on user preferences and device capabilities, ensuring efficient performance.

Technical Approach

A fully convolutional network identifies potential face areas within video frames, refining these into precise bounding boxes through a convolutional neural network. The system extracts facial expression weights, head poses, and landmarks from these boxes, using an overloaded output convolutional network to create accurate 3D avatar animations. For each subsequent video frame, the system checks for face presence within the bounding box to update the animation accordingly.

Implementation Variations

Several implementations enhance the method's robustness. These include resetting the process if a face is not detected in subsequent frames and using additional neural networks for increased detail. The system can also identify specific conditions like a protruding tongue, improving animation accuracy through specialized sub-models.

System Components

The described system includes memory with stored instructions and a processing device that executes these instructions. It performs operations such as detecting faces in video frames, refining bounding boxes, and generating 3D avatar animations based on extracted facial data. The system adapts to different levels of detail depending on computational resources and user settings.

Training and Data

The neural networks are trained using both artificially generated video frames and real, hand-labeled images. This dual training approach enhances the system's ability to accurately interpret facial expressions and movements, ensuring reliable performance across various scenarios.