US20240135686
2024-04-25
Physics
G06V10/774
A method for training a neural network model is proposed, focusing on augmenting images of objects captured by multiple cameras. The process begins by obtaining an object recognition result from a first neural network model, which uses an image taken from a specific viewpoint. This initial recognition result is then converted based on the coordinate systems of two different cameras capturing the same space from distinct angles.
The method includes generating training data by labeling a second image that corresponds to the first image. This second image is captured by the second camera, and the labeling process relies on the converted object recognition result that reflects the perspective of the second camera. This step is crucial for creating accurate training data that can improve the performance of a neural network model.
Once the training data is generated, it is used to train a second neural network model. This training utilizes the augmented data derived from both camera inputs, allowing for a more robust learning process that can enhance object recognition capabilities across varying viewpoints.
The electronic device designed for this method comprises several components, including a camera, communication unit, memory for storing instructions, and a processor. The processor executes instructions to obtain object recognition results, convert these results between different camera coordinate systems, and generate training data for further model training.
A cloud server can also implement this method, featuring similar components as the electronic device. It processes object recognition results and images captured from different viewpoints through its communication unit. The server facilitates data conversion and training data generation, thereby supporting the training of neural network models in a cloud-based environment.