US20240414437
2024-12-12
Electricity
H04N23/67
The autonomous video conferencing system utilizes subsymbolic and symbolic artificial intelligence to enhance remote collaboration. It features a main smart camera, multiple peripheral smart cameras, and optional smart sensors. These components work together to detect objects, gestures, and postures, applying television studio production principles via a virtual director. The main camera dynamically updates the focus video stream in real time, streaming it to a user's computer to create an engaging conferencing experience.
Traditional video conferencing systems often fail to capture dynamic interactions and spatial cues effectively, leading to a less engaging experience for remote participants. Existing solutions typically use a single camera, limiting the ability to capture facial expressions and subtle gestures. Despite advancements in user controls and preferences, the lack of comprehensive real-time engagement remains a challenge. This system addresses these limitations by creating a cohesive production that mirrors television studio quality.
The system comprises multiple smart cameras with image sensors that produce overview and focus video streams. Each camera has a vision pipeline powered by machine learning to detect objects and postures, while a virtual director applies predetermined rules for framing objects of interest. A stream selector transitions focus streams between cameras, ensuring the main camera outputs an updated stream to the user. Smart sensors can also be integrated to provide additional non-image data inputs.
In one configuration, one camera acts as the main camera while others serve as peripherals. Peripheral cameras transmit their updated focus streams to the main camera, which selects the best stream for output. Smart sensors like microphones or touchpads can enhance data input for improved framing decisions. These sensors connect through application programming interfaces, contributing additional context for the virtual director's rule set.
The virtual director uses a rule set based on television production principles to determine optimal framing. Parameters such as speaking status, gaze direction, and visibility inform these decisions. The system adapts shot types like total shots for context or close shots for focused presentation based on scene cues. This approach ensures dynamic transitions and comprehensive coverage of the conferencing space, enhancing remote engagement.