Invention Title:

Multimodal Scene Graph for Generating Media Elements

Publication number:

US20250316000

Publication date:

2025-10-09

Section:

Physics

Class:

G06T11/60

Inventors:

Timothy JACOBY 🇺🇸 Atherton, CA, United States

Junian Ioffe 🇺🇸 Brooklyn, NY, United States

Michael Andrew HOWARD 🇬🇧 Reading, United Kingdom

Leon ZHAN 🇺🇸 Tysons, VA, United States

Gheorghe POSTELNICU 🇬🇧 London, United Kingdom

Ya WANG 🇺🇸 Champaign, IL, United States

Nikola OTASEVIC 🇬🇧 London, United Kingdom

Applicant:

META PLATFORMS TECHNOLOGIES, LLC 🇺🇸 Menlo Park, CA, United States

Smart overview of the Invention

The patent application describes a system for generating media elements using a multimodal scene graph. A scene manager processes visual information, such as videos or images, to create a multimodal scene graph composed of components and metadata. This graph can be used to generate various social media elements, including images, videos, and artificial reality scenes. For instance, a user's video can be transformed into a scene graph to produce images like memes or avatars, which can be shared and modified by other users on social platforms.

Technical Approach

The scene manager utilizes trained machine learning models to recognize objects in visual data, storing structural, animation, and location information as component data. Metadata such as background and audio are also captured. The system converts visual information into serialized data, enabling the generation of media elements from this data. The process allows for the creation of diverse media forms, such as animated images or stickers, by capturing specific moments or poses from a video.

Applications in Artificial Reality

The technology supports artificial reality environments by processing scenes that include avatars and virtual objects. Machine learning models analyze these scenes to store relevant data in the multimodal scene graph. Media elements can then be rendered from different perspectives within the artificial reality environment. This capability allows for dynamic presentation and interaction with content in virtual spaces, enhancing user engagement.

Social Interaction and Customization

Users can interact with and modify media elements generated from multimodal scene graphs on social platforms. They can create variations by altering avatars, backgrounds, captions, and more. The system allows users to edit existing media by integrating new component data from additional multimodal scene graphs. This flexibility fosters personalized content creation and sharing, enriching social interactions with customized media experiences.

Integration with Artificial Reality Systems

The disclosed technology integrates with various artificial reality systems, including virtual reality (VR), augmented reality (AR), and mixed reality (MR). These systems provide immersive experiences by combining generated content with real-world inputs. The technology is adaptable across platforms such as head-mounted displays (HMDs) and mobile devices, facilitating the creation and consumption of artificial reality content through diverse hardware configurations.