Invention Title:

MULTIMODAL TASK EXECUTION AND TEXT EDITING FOR A WEARABLE SYSTEM

Publication number:

US20260079566

Publication date:
Section:

Physics

Class:

G06F3/011

Inventors:

Applicant:

Smart overview of the Invention

Wearable systems described here utilize multiple input modes such as gestures, head pose, eye gaze, voice, and environmental factors to execute commands and interact with objects in a 3D environment. These inputs allow users to perform tasks like composing, selecting, or editing text within virtual or augmented reality settings. The combination of inputs enhances accuracy and reduces errors, making interaction more seamless and intuitive.

Technical Field and Background

This innovation pertains to virtual reality (VR), augmented reality (AR), and mixed reality (MR) systems, focusing on user interaction with virtual objects or text in a 3D space. Traditional computing systems often rely on single-input methods like keyboards or mice, which can be limiting in dynamic VR/AR/MR environments. The complexity of human visual perception and the need for natural interactions in these environments present significant challenges, which the described systems aim to address.

Multimodal Inputs for Enhanced Interaction

Wearable devices can parse multimodal inputs to interact with virtual objects, determining which objects to operate on and what operations to execute. For instance, combining voice commands with eye gaze and hand gestures allows for precise control over object manipulation, such as moving or resizing. This approach reduces the specificity required in commands and minimizes errors, enhancing the user experience in 3D environments.

Text Interaction and Editing

The wearable system supports text interaction using multimodal inputs, allowing users to compose and edit text efficiently. Users can dictate text using voice commands and select or edit it using eye gaze or gestures. This multimodal approach overcomes the limitations of single-mode inputs, such as the slow speed of virtual keyboards or the error-prone nature of voice inputs alone, by combining the strengths of different input modes.

Benefits and Advantages

The described system not only improves interaction precision but can also reduce hardware costs by using lower-resolution components effectively. For example, combining voice commands with low-resolution eye-tracking can achieve accurate task execution without the need for expensive, high-resolution equipment. These multimodal input techniques offer robust, cost-effective solutions for interacting with VR/AR/MR devices, enhancing both user experience and system efficiency.