US20240185551
2024-06-06
Physics
G06V10/235
The focus is on improving image search capabilities by allowing users to search for images based on specific objects within an input image. Instead of relying solely on the entire image, users can select particular objects, which can then be used to generate a query image. This method addresses the limitations of traditional image search techniques that often overlook user intent when searching for specific elements within an image.
Users can interact with the input image to identify and select two or more objects of interest. The application allows for the creation of a query image that highlights these selected objects while excluding others. This process enables a more targeted search experience, ensuring that the results are closely aligned with the user's specific interests rather than being influenced by unrelated parts of the input image.
A significant feature is the ability to adjust the relative positioning of selected objects within the query image. Users can manually rearrange objects or utilize automated processes to bring objects closer together. This flexibility helps in generating search results that reflect desired physical relationships between objects, such as retrieving images of jockeys riding horses by modifying an existing image of a jockey and a horse standing apart.
The system employs machine learning techniques to enhance object detection and segmentation within images. Various frameworks, including neural networks, are utilized to train models that can accurately identify and manipulate objects in images. The training process involves supervised and unsupervised learning methods, enabling the model to improve its accuracy over time and deliver precise results during inference.
Key terms include "input image," which serves as the basis for searches, and "object," referring to any identifiable entity within an image. An "object mask" is created to isolate selected objects from their background, while a "composite image" merges multiple object masks into a single query image for enhanced searching capabilities. These definitions establish a clear understanding of how images are processed and searched within this innovative framework.