US20250384627
2025-12-18
Physics
G06T17/10
The described system and method facilitate real-time, three-dimensional reconstruction of dynamic scenes, particularly for sports events and concerts. Utilizing a two-level parallel computation approach, the system processes multi-view video streams to reconstruct multiple frames and dynamic elements simultaneously. This method leverages distributed processing nodes optimized for parallel execution, enhancing efficiency in creating interactive 3D experiences.
The system employs a dual-level parallelization strategy. The first level involves frame-level parallelization, where consecutive multi-view frames are processed simultaneously across distributed GPUs. The second level focuses on element-level parallelization, allowing for the independent reconstruction of dynamic elements like humans or objects within the scene. These reconstructed elements are then combined into an aggregated representation, followed by refinement to form a per-frame point cloud.
Different reconstruction methods are applied to dynamic and static elements within the scene. Dynamic elements, such as human subjects, are initialized using 3D primitives from a fitted parametric model or a dual-branch renderer. The system refines these primitives through processes like pose estimation and skeleton optimization. Static elements, typically background components, are processed using a splatting-based method tailored to their characteristics.
The method involves several optimization steps for dynamic scenes. For human elements, the system gathers multi-view frames, estimates a 3D pose model, and uses splatting-based reconstruction techniques. It fits a parametric mesh model to refine the skeleton and appearance. For static elements, the method includes fitting a 3D primitives model of the environment and potentially increasing model density in regions of interest, followed by appearance parameter optimization using spherical harmonics.
The invention can be implemented as a computer-implemented method or stored on a non-transitory computer-readable medium. It involves identifying elements in an environment, segmenting frames, optimizing models, and refining them to create a unified 3D model for each time frame. Enhancements include caching changes to avoid recomputation, capturing rendering operations as a static computational graph, and redistributing Gaussians to balance computational loads.