Invention Title:

AUDIOVISUAL DEEPFAKE DETECTION

Publication number:

US20250037506

Publication date:
Section:

Physics

Class:

G06V40/40

Inventors:

Assignee:

Applicant:

Drawings (4 of 9)

Smart overview of the Invention

Technical Field: The patent application relates to systems and methods for managing, training, and deploying machine learning architectures for audiovisual deepfake detection and biometric-based identity recognition. The system integrates both audio and visual data analysis to enhance the accuracy and efficiency of detecting manipulated content.

Background

Deepfake technology has advanced significantly, allowing the creation of highly convincing fake audio and video content. This poses risks to individuals and systems that rely on audiovisual data for authentication and identity verification. The proliferation of deepfakes can harm reputations and spread misinformation, highlighting the need for improved detection systems that can handle both audio and visual data effectively.

Summary of the Invention

The disclosed invention presents a consolidated system using machine-learning architectures to evaluate audiovisual data for deepfake detection and identity verification. Unlike conventional systems that separately analyze audio or visual data, this integrated approach improves overall accuracy by combining multiple scoring components. These include sub-architectures for speaker deepfake detection, facial recognition, and lip-sync estimation.

Embodiments

  • A computer-implemented method involves obtaining an audiovisual data sample, applying a machine-learning architecture to generate similarity and deepfake scores, and producing a final output score indicating the genuineness of the data.
  • A processor is configured to perform similar operations, emphasizing the extraction of biometric embeddings and spoofprints from audiovisual data.

System Components

The system comprises an analytics system with servers, databases, and admin devices connected via various networks. These components work together to process audiovisual data from end-user devices, providing scores that indicate whether the input is genuine or contains deepfake content. The system leverages public or private networks for communication, using protocols like TCP/IP and UDP to facilitate data exchange.