US20250316000
2025-10-09
Physics
G06T11/60
The patent application describes a system for generating media elements using a multimodal scene graph. A scene manager processes visual information, such as videos or images, to create a multimodal scene graph composed of components and metadata. This graph can be used to generate various social media elements, including images, videos, and artificial reality scenes. For instance, a user's video can be transformed into a scene graph to produce images like memes or avatars, which can be shared and modified by other users on social platforms.
The scene manager utilizes trained machine learning models to recognize objects in visual data, storing structural, animation, and location information as component data. Metadata such as background and audio are also captured. The system converts visual information into serialized data, enabling the generation of media elements from this data. The process allows for the creation of diverse media forms, such as animated images or stickers, by capturing specific moments or poses from a video.
The technology supports artificial reality environments by processing scenes that include avatars and virtual objects. Machine learning models analyze these scenes to store relevant data in the multimodal scene graph. Media elements can then be rendered from different perspectives within the artificial reality environment. This capability allows for dynamic presentation and interaction with content in virtual spaces, enhancing user engagement.
Users can interact with and modify media elements generated from multimodal scene graphs on social platforms. They can create variations by altering avatars, backgrounds, captions, and more. The system allows users to edit existing media by integrating new component data from additional multimodal scene graphs. This flexibility fosters personalized content creation and sharing, enriching social interactions with customized media experiences.
The disclosed technology integrates with various artificial reality systems, including virtual reality (VR), augmented reality (AR), and mixed reality (MR). These systems provide immersive experiences by combining generated content with real-world inputs. The technology is adaptable across platforms such as head-mounted displays (HMDs) and mobile devices, facilitating the creation and consumption of artificial reality content through diverse hardware configurations.