US20240404163
2024-12-05
Physics
G06T13/80
The patent application describes a system designed to efficiently generate video content that is engaging and interesting to users. This is achieved by using input data in the form of story description text, which is then processed through various models to create a complete video. The process involves generating narrative text, identifying key parts of the narrative, and creating corresponding images and animations. This method aims to reduce the complexity, time, and cost associated with traditional video content production.
The method begins by obtaining input data that includes a story description. This data is fed into a narration model to generate narrative text. A subset of this text is then identified and provided to an image generation model, resulting in generated images. These images are used by an animation model to create video segments. Simultaneously, the narrative text is converted into speech using a text-to-speech model. Finally, the video segments and narrative speech are combined to produce the final video content.
The patent also covers a non-transitory computer-readable medium that stores instructions for executing the described method on a computing system. This system performs acts such as obtaining input data, generating narrative text, creating images and animations, and combining these elements into video content. The computing system is designed to automate these processes, facilitating quick and efficient video generation.
The content system architecture includes components such as a content generator, user-profile database, content-distribution system, and content-presentation device. These components are interconnected through various connection mechanisms that allow for efficient communication and operation. The architecture supports both live-action and synthetically generated video content, accommodating different formats and protocols for organizing video data.
Video content generated by this system can include audio components synchronized with the video. Metadata within the video data helps align audio with corresponding video frames. The system can handle various types of video content like movies or television shows, which may consist of multiple segments representing different scenes or acts. This flexibility allows for diverse applications of the generated content.