Invention Title:

AUTONOMOUS VIDEO CONFERENCING SYSTEM WITH VIRTUAL DIRECTOR ASSISTANCE

Publication number:

US20240414437

Publication date:

2024-12-12

Section:

Electricity

Class:

H04N23/67

Inventors:

Jan Tore Korneliussen Oslo, Norway

Stein Ove Eriksen Oslo, Norway

Knut Helge Teppan Asker, Norway

Kai Alexander WIG Oslo, Norway

Mona Kleven LAURITZEN Oslo, Norway

Jon Tore Hafstad Oslo, Norway

Aida C. Lopez Oslo, Norway

Elena You Oslo, Norway

Lars Erling Stensen Oslo, Norway

Stian Selbek Koppang, Norway

Tamás Becsei Oslo, Norway

Niklas Schmidt Oslo, Norway

Therese Byhring Oslo, Norway

Vebjørn Boge Nilssen Oslo, Norway

Patrik Kvarme Hansen Oslo, Norway

Vegard HAMMER Tårnåsen, Norway

Bendik Kvamstad Oslo, Norway

Håvard Pederson Alstad Oslo, Norway

Oleg Jakobsen Vinterbro, Norway

Applicant:

Huddly AS Oslo, Norway

Smart overview of the Invention

The autonomous video conferencing system utilizes subsymbolic and symbolic artificial intelligence to enhance remote collaboration. It features a main smart camera, multiple peripheral smart cameras, and optional smart sensors. These components work together to detect objects, gestures, and postures, applying television studio production principles via a virtual director. The main camera dynamically updates the focus video stream in real time, streaming it to a user's computer to create an engaging conferencing experience.

Background

Traditional video conferencing systems often fail to capture dynamic interactions and spatial cues effectively, leading to a less engaging experience for remote participants. Existing solutions typically use a single camera, limiting the ability to capture facial expressions and subtle gestures. Despite advancements in user controls and preferences, the lack of comprehensive real-time engagement remains a challenge. This system addresses these limitations by creating a cohesive production that mirrors television studio quality.

System Components

The system comprises multiple smart cameras with image sensors that produce overview and focus video streams. Each camera has a vision pipeline powered by machine learning to detect objects and postures, while a virtual director applies predetermined rules for framing objects of interest. A stream selector transitions focus streams between cameras, ensuring the main camera outputs an updated stream to the user. Smart sensors can also be integrated to provide additional non-image data inputs.

Camera and Sensor Integration

In one configuration, one camera acts as the main camera while others serve as peripherals. Peripheral cameras transmit their updated focus streams to the main camera, which selects the best stream for output. Smart sensors like microphones or touchpads can enhance data input for improved framing decisions. These sensors connect through application programming interfaces, contributing additional context for the virtual director's rule set.

Framing Rules

The virtual director uses a rule set based on television production principles to determine optimal framing. Parameters such as speaking status, gaze direction, and visibility inform these decisions. The system adapts shot types like total shots for context or close shots for focused presentation based on scene cues. This approach ensures dynamic transitions and comprehensive coverage of the conferencing space, enhancing remote engagement.