US20260059180
2026-02-26
Electricity
H04N21/8456
The patent application describes a system and method for AI-powered generation and delivery of video clips. It involves a processor that classifies video content using narrative classifiers derived from both subtitle text and video frames. The system utilizes natural language processing (NLP) to analyze dialogue in subtitles, identifying narrative elements and associating them with timestamps. Simultaneously, an image recognition model processes video frames to identify additional narrative elements, such as objects or people, also associating them with timestamps. These outputs are combined to generate a set of timestamps linked to narrative classifiers.
The processor defines segments within the video content, each consisting of a start and end timestamp and associated narrative classifiers. Video clips are generated by selecting segments based on prioritization rules, ensuring the total duration is less than a set time value. The prioritization rules assign higher importance to narrative elements central to the broader storyline or specific scenes. This allows users to view condensed versions of videos that maintain narrative coherence, catering to those with limited viewing time or specific content preferences.
Narrative classifiers encompass broader narrative aspects and specific scene features. The NLP model includes dialogue, thematic analysis, and event detection, identifying elements like character interactions. Architectures such as RNN, LSTM, CNN, BERT, or GPT may be used. The image recognition model involves object and facial detection, scene classification, and event detection, identifying plot points like character and emotion recognition. It may use CNN or supervised learning models. Outputs from NLP and image recognition are combined through techniques like sentiment analysis or thematic clustering.
The system processes user requests for clips, considering maximum duration and narrative preferences, which can be manually entered or derived from user profiles. Narrative classifiers are prioritized according to these preferences. Content analysis from NLP and image recognition outputs is stored for quick clip generation upon user request. The system can recommend video content to users, generating clips in advance based on user profiles or inputs, and delivering them on demand. User profiles are informed by inputs, genre preferences, or historical viewing data and are used to match users with content.
The system can pre-generate multiple clips of varying lengths for each video content item, available for user selection. Narrative classifiers can be specific to the genre, scene, or moment. The processor loads video and subtitle files, executes an ML model for processing dialogue and video sequences, and associates narrative classifiers with scenes. Video clips are delivered based on user profiles or inputs, enhancing user satisfaction by aligning with their viewing preferences and constraints.