Invention Title:

TRANSFERRING SALIENT DEPTH PROPERTIES FROM LABELED DATA TO UNLABELED DATASETS FOR MONOCULAR DEPTH ESTIMATION

Publication number:

US20260073537

Publication date:
Section:

Physics

Class:

G06T7/50

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The patent application discusses a method and apparatus for training a monocular depth estimation (MDE) network using a mixed supervision approach. This involves using both labeled and unlabeled datasets to improve depth estimation from monocular images. The method includes obtaining a source dataset with ground truth depth maps and a target dataset with images, then using these to train the MDE network alongside a pose network through a combination of fully-supervised and self-supervised training.

Field and Related Art

Monocular depth estimation is crucial for applications requiring three-dimensional scene understanding, such as autonomous vehicles and augmented reality. Traditional methods often rely on additional sensors like LiDAR or stereo cameras, which can be costly and complex. The proposed method aims to address these challenges by refining depth estimation using single images, reducing reliance on extra hardware while maintaining accuracy.

Summary of the Invention

The innovation involves a training process for an MDE network that leverages mixed supervision. The process begins by obtaining a source dataset with images and corresponding ground truth depth maps, and a target dataset with additional images. The MDE network generates estimated depth maps for both datasets, while a pose network estimates relative poses from the target dataset. Training combines fully-supervised techniques with labeled data and self-supervised techniques with unlabeled data, enhancing the network's depth estimation capabilities.

Training and Implementation

The training process entails generating estimated depth maps and relative poses, which are then used to refine the MDE network. Fully-supervised training uses the labeled source dataset, while self-supervised training utilizes the target dataset's relative poses. This dual approach allows the network to learn effectively from both types of data, improving its ability to estimate depth from monocular images without extensive sensor setups.

Applications and Benefits

This method is applicable in various fields such as autonomous driving and robotics, where accurate depth perception is vital. By reducing dependency on additional sensors, the approach offers a cost-effective solution for depth estimation. The mixed supervision training enhances the network's adaptability to new scenes, making it suitable for dynamic environments where traditional methods might struggle.