Invention Title:

METHODS AND SYSTEMS FOR REAL TIME VIDEO DRIVEN HUMAN 3-D POSTURE ESTIMATION

Publication number:

US20250336236

Publication date:
Section:

Physics

Class:

G06V40/23

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The application introduces an innovative approach for real-time video-driven estimation of human 3-dimensional (3-D) posture during physical activities. It addresses limitations in existing methods, which either overlook temporal information or require excessive computational time due to complex processing stages. The proposed system leverages a smartphone camera and an auto-encoder based architecture to achieve accurate posture estimation with a minimal error margin, utilizing only monocular video input from a single low-end mobile device.

Technical Field

This advancement falls within the realm of posture estimation technologies, specifically targeting the real-time assessment of human 3-D posture during exercises such as gym workouts and yoga. Proper posture is crucial for preventing musculoskeletal issues and spinal injuries. The system automates this process using machine learning and deep learning technologies, allowing personal mobile devices to perform tasks traditionally done by trainers.

Background

Conventional methods for estimating 3-D posture from 2-D images often require additional hardware like depth sensors or multiple cameras, which complicates the setup. Recent advancements have attempted to model the entire 3-D shape using parametric forms like SMPL models, but these approaches demand significant memory and processing power while offering less accuracy compared to skeleton-based methods. Additionally, techniques that incorporate temporal information often suffer from increased processing time due to dual-state computations.

Methodology

The disclosed system utilizes a processor-implemented method involving several key steps: training datasets are used to develop a neural network model comprising an autoencoder network and a second encoder. The autoencoder is trained to minimize reconstruction and bone length consistency losses, while the second encoder refines the latent space representation of video frames. This trained model then processes real-time video input, dividing it into clips, detecting human presence, and estimating 3-D posture with high precision by leveraging the trained autoencoder network.

System Components

The system is composed of memory storage for instructions, input/output interfaces, and hardware processors configured to execute the training and real-time estimation processes. It receives training datasets containing annotated videos to train both the autoencoder and second encoder in sequence. The system processes real-time test videos by dividing them into clips, detecting human presence, and using the trained model to estimate 3-D postures accurately. This setup enables efficient posture monitoring using readily available mobile technology.