US20250238991
2025-07-24
Physics
G06T13/40
The patent application describes a method for generating augmented reality (AR) instructional content utilizing generative artificial intelligence. This method allows users to create AR instructions from text descriptions without requiring programming or motion capture. The system integrates virtual avatar animations to demonstrate tasks, enhancing the realism and engagement of the AR experience. It also addresses the challenge of context-awareness by blending physical reality with virtual elements.
Augmented Reality instructions offer an immersive learning experience by overlaying digital content on physical environments, helping users visualize and practice complex tasks. Traditional methods for creating animated humanoid avatars in AR require programming skills, posing a barrier for many users. Alternatives like embodied demonstration simplify avatar creation but still depend on motion capture hardware. The emergence of Generative Artificial Intelligence (Gen-AI) offers a solution by enabling content creation through intuitive user inputs, though it often lacks contextual awareness crucial for effective AR instruction.
The disclosed method involves receiving natural language descriptions of tasks from users and generating step-by-step text instructions using a machine learning model. It captures contextual information about the environment where the task will occur, using sensors to gather spatial data. This information is used alongside text instructions to create animations of virtual avatars performing the task, which are then displayed through AR or VR devices, providing contextually aware instructional content.
The AR instruction authoring system aims to deliver spatially aware content that aligns with real-world environments. It ensures seamless transitions between interactions and allows demonstrations at varying scales, adapting to user preferences for viewing perspectives. Users can choose between a third-person view or a first-person view, influencing the avatar's scale and interactions. The system also offers flexibility in editing and modifying instructional content, enhancing user control over the generated materials.
The system operates through a head-mounted AR device that presents an interface for user interaction. Users provide natural language inputs to describe tasks and can refine generated step-by-step instructions displayed in the interface. By moving through their environment and capturing images with the device, users supply additional context that informs the creation of virtual avatar animations. This process allows the system to generate contextually informed AR content tailored to specific environments and tasks.