Invention Title:

MOTION CUSTOMIZATION FOR DIGITAL VIDEOS

Publication number:

US20250392795

Publication date:

2025-12-25

Section:

Electricity

Class:

H04N21/816

Inventors:

Feng Liu 🇺🇸 Portland, OR, United States

Jing Shi 🇺🇸 San Jose, CA, United States

Jimei YANG 🇺🇸 Mountain View, CA, United States

YANG ZHOU 🇺🇸 Mountain View, CA, United States

Difan Liu 🇺🇸 Amhert, MA, United States

Yixuan Ren 🇺🇸 Greenbelt, MA, United States

Mingi Kwon 🇰🇷 Seoul, South Korea

Assignee:

Adobe Inc. 🇺🇸 San Jose, CA, United States

Applicant:

Adobe Inc. 🇺🇸 San Jose, CA, United States

Smart overview of the Invention

A novel approach for customizing motion in digital videos is introduced, utilizing machine learning to replicate movements from one video to another. This method involves training a machine-learning model on a reference video and caption that describe a specific object and its movement. By employing this trained model, a new video can be generated where a target object replicates the reference movement, guided by a target text prompt.

Background

Traditional methods of motion replication in videos, such as those used in the movie industry, are often labor-intensive and costly. These methods require significant manual effort and computational resources, limiting their efficiency and flexibility. The need for an automated and adaptable approach has led to the development of techniques that utilize machine learning models to generate videos based on textual descriptions and reference movements.

Methodology

The process begins with receiving a reference digital video and a caption that describes the movement and object within the video. A machine-learning model is then trained in two stages: first, to understand the object's appearance, and second, to capture the movement dynamics. The model is then used to generate a new video where a specified target object performs the same movement, as described by a target text prompt.

Technical Implementation

This approach leverages low-rank adaptation (LoRA) techniques applied to a pre-trained text-to-video (T2V) diffusion model. The model is trained to separate and retain only the motion information from the reference video, allowing it to be applied to different subjects and scenes. The use of appearance absorbers helps isolate motion dynamics from appearance details, enabling the generation of dynamic and visually appealing videos.

Application and Benefits

The described system allows for the creation of videos where target objects mimic complex movements from reference videos, providing flexibility in changing subjects and environments. This method supports variations in motion intensity, position, and camera angles, resulting in more engaging and natural-looking videos. By automating the process of motion transfer, this technology enhances efficiency and creativity in video production.