Invention Title:

AUTOMATIC REPLACEMENT OF TARGETED OBJECTS WITHIN ARBITRARY MEDIA

Publication number:

US20250095646

Publication date:

2025-03-20

Section:

Physics

Class:

G10L15/22

Inventors:

Killian Levacher Dublin, Ireland

Stefano Braghin Dublin, Ireland

Hessel TUINHOF Dublin, Ireland

Marco Simioni Pisa, Italy

Applicant:

INTERNATIONAL BUSINESS MACHINES CORPORATION Armonk, NY, United States

Smart overview of the Invention

The patent application introduces a method for safeguarding sensitive data within audio files by utilizing advanced AI techniques. The process involves extracting the voice from an audio file, transcribing it into text, and identifying sensitive information using natural language processing. A synthetic voice similar to the original is then used to replace the sensitive data with contextually appropriate synthetic data, producing a new audio file where sensitive content is seamlessly replaced.

Background

Generative AI models, particularly Generative Adversarial Networks (GANs), have been instrumental in creating realistic synthetic media. These models are capable of generating new content that statistically resembles their training data. In recent years, such technology has been applied to develop deepfakes, which manipulate media to convincingly replace one person's likeness with another's. This invention leverages these advancements to address privacy concerns regarding audio data.

Technical Summary

The method described includes several components: a signal separator extracts the voice from an audio file, and a speech-to-text component transcribes this into text. Sensitive information is identified through parsing by natural language processing. A voice locator finds a matching synthetic voice, and a voice replacer substitutes the sensitive data with meaningful synthetic content, maintaining both semantic and contextual integrity. The resulting audio file replaces sensitive information without disrupting the flow of conversation.

Detailed Description

The invention addresses the challenge of maintaining conversational flow while protecting sensitive data in recorded interactions. Traditional methods often disrupt audio with tones or beeps, hindering comprehension. By using synthetic voices to replace personal information seamlessly, the invention preserves the conversation's integrity. This approach is particularly useful for handling personal information, confidential data, and other sensitive content that requires protection.

Implementation

Various embodiments of this invention can be implemented through computer systems involving components like processors, memory storage, and network modules. The system may operate across different devices such as desktops, laptops, or cloud-based servers. The implementation details include narrative text and flowcharts that outline the operations, which may vary depending on technological requirements. The system ensures that sensitive audio data is replaced effectively while retaining conversational authenticity.