Invention Title:

MULTI-MODAL DATA-STREAM-BASED ARTIFICIAL INTELLIGENCE INTERVENTIONS IN A VIRTUAL ENVIRONMENT SYSTEM AND METHOD

Publication number:

US20240372967

Publication date:

2024-11-07

Section:

Electricity

Class:

H04N7/157

Inventors:

Cevat YERLI Dubai, United Arab Emirates

Robert Harry BLACK Wirral, Great Britain (UK)

Stefanie PALOMINO Berlin, Germany

Applicant:

TMRW FOUNDATION IP S.ÀR.L. Luxembourg, Luxembourg

Smart overview of the Invention

The patent application introduces a system and method for enhancing video conferencing through artificial intelligence (AI) and machine learning. It focuses on creating a more engaging virtual environment by detecting and responding to participants' emotional states. By analyzing multi-modal data streams, including audio and visual inputs, the system can identify emotions such as boredom, happiness, or sadness, and implement interventions to improve engagement and productivity.

Technical Field

The technology pertains to improving video conferencing systems by leveraging AI to create interactive and dynamic virtual environments. This approach aims to transcend the limitations of traditional video conferencing formats by fostering better communication and collaboration among participants through enhanced user experiences.

Challenges in Conventional Video Conferencing

Traditional video conferencing often leads to unengaging meetings, resulting in decreased attention and productivity. The lack of nonverbal cues and face-to-face interaction can hinder rapport building and lead to misunderstandings. This has driven the demand for innovative solutions that make virtual meetings more interactive and effective.

System Functionality

The proposed system processes a variety of inputs, such as camera feeds and audio streams, to determine participants' emotional states. It utilizes a server-based architecture to manage these inputs and generate appropriate outputs, such as camera perspective changes or environmental adjustments in the virtual setting. These interventions are tailored based on contextual scenarios, enhancing the overall meeting experience.

Emotional Cues and Interventions

The system detects emotional cues from participants through body language, facial expressions, tone of voice, and word choice. For instance, if enthusiasm is detected, the system might adjust lighting or sound effects to match this mood. Conversely, if boredom is sensed, it could introduce elements to re-engage participants. This dynamic adjustment aims to maintain high levels of energy and participation throughout virtual meetings.