US20250029170
2025-01-23
Physics
G06Q30/0639
The patent application describes a system that leverages Augmented Reality (AR) and a Vision-and-Language Model (VLM) powered by multi-modal Artificial Intelligence (AI) to automatically generate in-store product information and navigation guidance. Images captured within a retail venue via smartphones, AR devices, or smart glasses are processed by the VLM to deliver real-time insights on product recognition, product-related information, virtual shopping assistance, and in-store navigation guidance. This system is designed to enhance the shopping experience by providing users with immediate access to detailed product data and navigational help within the store environment.
The method involves capturing images within the store, which are then pre-processed using machine learning for object boundary detection. These images are analyzed by the VLM to produce various outputs such as product recognition and navigation guidance. The system can operate in dynamic environments, adapting to changes like product restocking or rearrangement. It creates an up-to-date planogram of the store, reflecting real-time inventory and product placement, which aids in navigating users through the store more effectively.
The system employs crowd-sourcing by collecting images from end-users, such as customers and employees, to map the venue's inventory and layout. Participants may be incentivized through rewards like discounts or virtual points. This collaborative approach not only keeps the store's digital map current but also enhances user engagement by incorporating gamification elements such as AR avatars or characters, making the shopping experience more interactive and enjoyable.
By analyzing crowd-sourced data, the system can provide insights beyond conventional inventory databases. For instance, it can identify misplaced items or detect customer interest in specific products based on traffic patterns around certain shelves. These insights are generated using lightweight data formats like CSV or XML for efficiency. The system supports automated checkout processes through visual recognition, streamlining the shopping experience without relying solely on traditional barcode scanning.
The system enhances localization accuracy by integrating multiple data sources such as Wi-Fi signals and visual inputs. Machine learning techniques refine these inputs to determine precise user and product locations within the store. Additional methods include user feedback on product size or logos, GPS localization near store exits, and image processing techniques for better product isolation and tracking. These combined approaches ensure accurate product identification and location mapping, improving overall user experience.