US20250166666
2025-05-22
Physics
G11B27/036
The disclosed technology involves a method, apparatus, device, and medium for generating videos using a machine learning model. It determines two reference images from multiple images in a reference video and uses a reference text to describe the video. A generation model is then acquired based on these images and text, enabling the creation of a target video from new images and text. The approach allows the model to understand story development better, enhancing the dynamic and realistic quality of the generated videos.
This innovation pertains to computer vision, specifically utilizing machine learning models to automate video generation. Traditional methods often struggle with creating videos that exhibit realistic motion due to limited dynamicity in object movements. This new method aims to address these limitations by providing a more effective way to generate videos that meet desired content requirements.
The process involves selecting two key reference images from a reference video and receiving descriptive text. These elements are used to develop a generation model capable of producing a target video. The model leverages the second reference image as guidance for narrative progression, ensuring diverse and realistic image transitions within the video.
An apparatus is designed to facilitate this method, comprising modules for image determination, text reception, and model acquisition. An electronic device with processing units and memory executes these instructions, implementing the video generation method. Additionally, a computer-readable storage medium contains programs that enable processors to execute the described method.
Current video generation technologies face challenges in creating high-dynamic action sequences and complex visual effects due to resource demands and model limitations. This invention seeks to overcome these hurdles by enhancing the ability of models to interpret complex descriptions and produce cohesive, dynamic videos with improved motion visual effects.