US20250232786
2025-07-17
Physics
G10L25/63
The patent application describes a system that enhances user interfaces by integrating paralinguistic data with linguistic data during real-time interactions. Paralinguistic data, which includes non-verbal cues such as facial expressions and vocal intonations, can be simulated by AI or inferred from users through sensors. This integration aims to improve communication, facilitate understanding, and reduce misunderstandings in conversations involving AI agents or other users.
Generative AI, particularly large language models (LLMs), can create novel content similar to their training data. These models are capable of generating natural language and are valuable across various domains. The application focuses on using AI to present paralinguistic data alongside verbal data, enhancing the interaction experience by making it more intuitive and human-like.
The system addresses challenges in presenting AI-generated paralinguistic data effectively. It ensures that conversations with AI agents are natural by conveying human-like characteristics such as emotions. Additionally, it tackles potential inaccuracies in machine-learning models that infer human paralinguistic cues, which could lead to misinterpretations. The system provides interfaces that inform users about their own and others' inferred paralinguistic data, fostering better awareness and communication.
An example paralinguistic system includes an interactive application with a client and server, facilitating interaction among users and AI agents. Sensors detect various contextual cues from users, which are processed to generate paralinguistic classifications. These classifications help the AI agent respond more appropriately by considering the user's emotional state and other contextual factors.
Sensors collect diverse data types such as audio, video, and physiological signals to understand the user's paralinguistic state. The system processes this sensor data to enhance its accuracy and relevance. By utilizing machine-learning models, it classifies emotions and other paralinguistic cues in real time, augmenting verbal interactions with additional context. This approach enables more empathetic and context-aware responses from AI agents.