US20240403946
2024-12-05
Physics
G06Q30/0643
The invention introduces a system that utilizes neural networks to identify items within video content dynamically and unobtrusively. Unlike traditional methods that require manual annotation, this system responds to user interactions such as voice queries, touchscreen taps, or cursor movements. This approach enhances the user experience by providing information about both visible and non-visible items in a seamless manner, thus avoiding interruptions during video playback.
Existing interactive video interfaces are limited due to their reliance on manual annotations, which are costly and time-consuming. Previous attempts to enhance video interactivity have been disruptive and unintuitive, failing to provide comprehensive information. The new technology aims to overcome these limitations by automating metadata generation and offering an improved user interface that integrates smoothly with the viewing experience.
The technology includes two main user interfaces: the Overlaid Video Player and the Adjacent Layout. These interfaces work with a real-time image recognition engine to identify objects in videos and present relevant information. The Overlaid Video Player allows users to interact with products shown in videos efficiently, increasing engagement and purchase rates. Meanwhile, the Adjacent Layout offers additional visibility for products, facilitating easier purchases without disrupting the video content.
A key challenge addressed by this technology is the frequent updating of the Overlaid Video Player interface. By preloading and caching data for each video segment on the user's device, updates can occur rapidly without relying on continuous network requests. This ensures a smooth user experience while maintaining synchronization with the video content. Compatibility with various video player technologies is achieved through a generic relay system, allowing broad distribution across different platforms.
The system employs neural networks to recognize objects within videos by generating embeddings compared against a database of known objects. This allows for accurate identification and retrieval of metadata, which can include product links or similar items. The system supports various input methods for user requests, such as voice commands or touch interactions, further enhancing accessibility and usability across different devices.