US20240193667
2024-06-13
Physics
G06Q30/0631
The patent application introduces an AI device designed to provide multi-modal recommendations using behavior-based, review-based, and image-based recommenders. The system determines which recommender to use based on the modality, format, or content of a user's request. By leveraging knowledge graph embeddings, the device generates and outputs tailored recommendation results. This approach aims to enhance recommendation accuracy and user convenience by addressing various types of requests in a unified manner.
Traditional recommender systems have predominantly focused on single modalities, often struggling with diverse user preferences and the cold start problem for new users. While effective for certain categories like movies or books, these systems fall short in areas with complex user interests, such as food recipes. The proposed AI device addresses these challenges by integrating multiple recommendation approaches within a single system, thus providing a more comprehensive solution.
The behavior-based recommender (BRS) operates by analyzing connections in a knowledge graph to predict user preferences, such as liking specific recipes. It utilizes link prediction tasks and zero-shot inference to cater to users without prior interaction history. The BRS can also perform conditional recommendations based on clustered topics or categories that align with user interests.
The review-based recommender (RRS) encodes textual queries and matches them with reviews across items in the knowledge graph. It employs a hybrid approach by combining results from pre-trained NLP and KGE models to enhance recommendation relevance and accuracy. This method ensures that recommendations are well-aligned with the user's input request.
The image-based recommender (IRS) processes input images to identify and recommend visually similar items. It extracts key features from images and uses knowledge graph embeddings to refine the recommendation process. During training, a variational autoencoder is guided within the embedding space to learn image distributions, ensuring that similar images are grouped closely together for more relevant recommendations.