US20250391081
2025-12-25
Physics
G06T13/40
Objective: The patent application describes a method for generating hyperreal synthetic faces using latent space manipulation and neural animation. This involves training machine learning models to create synthetic faces that appear in altered video content, based on video data of an actor making mouth-generated sounds or a 3D model of a subject's face animated according to these sounds. The aim is to produce synthetic content so realistic that it is indistinguishable from real-life recordings.
Background: Hyperreal synthetic content is essential for the metaverse's development, where AI tools create highly realistic synthetic faces. Current technologies often fail to achieve natural-looking expressions, especially in mouth movements, making it difficult to produce hyperreal synthetic content at scale. The described techniques address these limitations by improving the realism and scalability of synthetic face generation.
Technical Process: The process begins by receiving unaltered video content featuring a subject and audio data of a different mouth-generated sound. A 3D model of the subject's face is animated to match the audio, and this model is aligned with 2D video frames. A machine learning model is trained using these aligned instances to generate a synthetic face. Latent space manipulation and neural animation enhance the facial expressions, making them appear more natural and realistic.
Latent Space and Neural Animation: Latent space represents compressed data learned by the machine learning model. Neural animation guides the manipulation of this space to improve the synthetic face's expressiveness. By applying a neural animation vector to a latent space point, the model generates a synthetic face with enhanced, natural-looking expressions. This method is more efficient and scalable than retraining models or altering original footage expressions.
Applications and Advantages: The techniques can be applied to lip-syncing and language translation, enabling altered video content to reflect different spoken languages or phrases. This method improves consumer experiences in synthetic content applications, such as the metaverse, by producing more realistic synthetic faces. It offers a technological advancement over existing methods, which are often time-consuming and lack the ability to reproduce high-quality synthetic content consistently.