US20250078377
2025-03-06
Physics
G06T13/40
The patent application discusses a method for body tracking using monocular video, which involves capturing video frames of a human subject's movement and processing them to create a 3D representation. This process uses a pre-trained neural network model to analyze 2D images extracted from the video frames and determine the subject's pose. The method emphasizes creating a 3D pose estimation of the upper body joints, determining confidence scores, and selecting keypoints for accurate animation of a 3D avatar.
This invention falls within the realm of computer graphics, focusing on tracking body movements without noticeable lag using a single camera feed. It is particularly useful for devices with limited processing capabilities, aiming to improve applications in gaming, virtual reality (VR), augmented reality (AR), and human-computer interaction by providing an accessible alternative to traditional motion capture systems that require specialized hardware.
The approach tackles several challenges inherent in monocular video-based body tracking. These include the difficulty of extrapolating 3D poses from 2D input data due to missing depth information, maintaining real-time performance without lag, and handling partial visibility or self-occlusion where parts of the body are obscured. The method aims to balance computational efficiency with accuracy, making it suitable for mobile or low-end devices.
The method includes several enhancements such as temporal smoothing across frames, calibration for camera distortions, and re-detection if confidence scores fall below a threshold. The neural network may use an attention mechanism to focus on keypoints during estimation, and joint positions of the avatar can be scaled to match the human subject's proportions. These features aim to improve tracking accuracy and reliability in various conditions.