Invention Title:

Systems and Methods for AI Driven Generation of Content With Language-based Attunement

Publication number:

US20240404161

Publication date:
Section:

Physics

Class:

G06T13/40

Inventors:

Assignee:

Applicant:

Drawings (4 of 12)

Smart overview of the Invention

The patent application describes systems and methods for creating an interactive avatar that is attuned to a user's emotional state. It involves collecting audio-visual data from user interactions to analyze vocal characteristics, facial features, and speech. The technology uses speech recognition and natural language understanding models to interpret the user's emotions. By combining acoustic, speech, and facial emotion metrics, an emotional complex signature is created, allowing the avatar to mirror the user's emotional state effectively.

Field of Technology

This invention falls within the realm of computer-based platforms designed for real-time generation of interactive avatars. These avatars are capable of responding with artificial speech and facial expressions that are empathetically aligned with user interactions. The technology aims to address social isolation by providing a virtual companion that can engage users in meaningful interactions.

Background

Social isolation is a significant issue impacting various demographics, including the elderly and younger generations who experience limited social interactions. The invention aims to mitigate loneliness by creating a virtual companion that can simulate a trusted relationship. This approach provides users with a sense of connection and emotional support, potentially improving mental well-being and quality of life.

Summary of Functionality

The interactive avatar utilizes biometric data to perceive emotional cues from body language, vocal qualities, and natural language processing. It can mimic these emotions through its appearance and responses, fostering a sense of attachment with the user. This process encourages users to share personal details more freely in a non-judgmental environment, enhancing the avatar's ability to provide tailored interactions based on shared history and emotional markers.

Technical Implementation

The system processes audio-visual input over time to determine vocal characteristics such as pitch and loudness, using models like Paul Ekman's Facial Action Coding System for facial recognition. It calculates time-varying emotion metrics from these inputs to create an emotional complex signature. This signature informs the rendering of the avatar's speech and expressions, ensuring they are synchronized with the user's emotional state. Additionally, the system archives past interactions to refine future responses.