US20250199829
2025-06-19
Physics
G06F9/453
Large language models (LLMs) and visual-language models (VLMs) are powerful tools that provide effective results when used with appropriate formatting. However, users often struggle to fully utilize these models due to a lack of expertise or patience. The patent application addresses this challenge by introducing a method to automatically generate AI prompts based on the understanding of the user's screen activity.
The method involves using an image encoder to process a current screenshot into an image embedding. This embedding is then compared with text embeddings that represent various screenshot activities. By identifying the text embedding that closely matches the image embedding, the system can determine the specific activity being performed by the user on their screen.
Once the screen activity is identified, AI prompts, referred to as "pills," are generated in real-time. These prompts are designed to assist users by providing suggestions or solutions related to their current activity. This real-time assistance aims to enhance user interaction with LLMs and VLMs by making them more accessible and easier to navigate.
The automatic generation of AI prompts can be applied in various scenarios where screen activity is involved, such as in educational tools, customer support systems, and productivity software. By leveraging screen understanding, this technology has the potential to significantly improve user experience across different platforms and applications.