Invention Title:

System and Method for AI-Powered Generation and Delivery of Video Clips

Publication number:

US20260059180

Publication date:

2026-02-26

Section:

Electricity

Class:

H04N21/8456

Inventors:

Partha Sarathi SAMAL 🇺🇸 Rocky Hill, CT, United States

Suresh Kumar PALUS 🇺🇸 Secane, PA, United States

Behrang ZANDI 🇺🇸 Danville, CA, United States

Alexandr Popov 🇺🇸 Oceanside, CA, United States

Sai Kiran Padmam 🇺🇸 Jersey City, NJ, United States

Kamal Viola 🇺🇸 Queens, NY, United States

Jonathan Ortiz 🇺🇸 San Francisco, CA, United States

Applicant:

CBS Interactive Inc. 🇺🇸 San Francisco, CA, United States

Smart overview of the Invention

The patent application describes a system and method for AI-powered generation and delivery of video clips. It involves a processor that classifies video content using narrative classifiers derived from both subtitle text and video frames. The system utilizes natural language processing (NLP) to analyze dialogue in subtitles, identifying narrative elements and associating them with timestamps. Simultaneously, an image recognition model processes video frames to identify additional narrative elements, such as objects or people, also associating them with timestamps. These outputs are combined to generate a set of timestamps linked to narrative classifiers.

Segment Definition and Clip Generation

The processor defines segments within the video content, each consisting of a start and end timestamp and associated narrative classifiers. Video clips are generated by selecting segments based on prioritization rules, ensuring the total duration is less than a set time value. The prioritization rules assign higher importance to narrative elements central to the broader storyline or specific scenes. This allows users to view condensed versions of videos that maintain narrative coherence, catering to those with limited viewing time or specific content preferences.

Narrative Classifiers and Models

Narrative classifiers encompass broader narrative aspects and specific scene features. The NLP model includes dialogue, thematic analysis, and event detection, identifying elements like character interactions. Architectures such as RNN, LSTM, CNN, BERT, or GPT may be used. The image recognition model involves object and facial detection, scene classification, and event detection, identifying plot points like character and emotion recognition. It may use CNN or supervised learning models. Outputs from NLP and image recognition are combined through techniques like sentiment analysis or thematic clustering.

User Interaction and Personalization

The system processes user requests for clips, considering maximum duration and narrative preferences, which can be manually entered or derived from user profiles. Narrative classifiers are prioritized according to these preferences. Content analysis from NLP and image recognition outputs is stored for quick clip generation upon user request. The system can recommend video content to users, generating clips in advance based on user profiles or inputs, and delivering them on demand. User profiles are informed by inputs, genre preferences, or historical viewing data and are used to match users with content.

Implementation and Delivery

The system can pre-generate multiple clips of varying lengths for each video content item, available for user selection. Narrative classifiers can be specific to the genre, scene, or moment. The processor loads video and subtitle files, executes an ML model for processing dialogue and video sequences, and associates narrative classifiers with scenes. Video clips are delivered based on user profiles or inputs, enhancing user satisfaction by aligning with their viewing preferences and constraints.