Invention Title:

CONSISTENT SCALABLE REPLICATION OVER DISPARATE DATA MODELS AND STORAGE SYSTEMS

Publication number:

US20260119464

Publication date:
Section:

Physics

Class:

G06F16/211

Inventor:

Assignee:

Applicant:

Smart overview of the Invention

Techniques are introduced for ensuring consistent and scalable data replication across heterogeneous data environments. The process involves receiving a data write from a source data store and executing it on a replica data store. This transaction is then processed by a router which identifies materializers. These materializers create semantic objects based on the transaction, which are then sent to target data stores. Each semantic object is assigned a watermark, using timestamps from the replica data store to manage race conditions and ensure data consistency.

Field and Background

The disclosed techniques relate to data systems, focusing on enhancing data replication in distributed storage systems. With the increasing diversity of data stores, applications have become more flexible in querying data. This flexibility allows for optimized scaling and workload management. Recent advancements, particularly in natural language processing, have improved the ability of data storage systems to handle semantic data concepts. These improvements are crucial for maintaining data across disparate data models and stores.

Replication Methodology

The replication process involves a computer-implemented method where a data storage system receives a source data write, which may be a direct or duplicated write. This write is executed on a replica data store, and a transaction is determined based on the data operations. The router then identifies materializers using a schema mapping, which creates semantic objects. These objects are sent to target data stores with different schemas. The method ensures the semantic objects are correctly formatted and written to the appropriate data store types.

Materializer and Semantic Objects

Materializers play a critical role by generating semantic objects from transactions. They identify updated tables and associated concepts using schema mappings. For each semantic object, a watermark is generated based on read timestamps from the replica data store. This process includes identifying primary keys, retrieving relevant values, and ensuring that semantic objects are accurately generated and transmitted to target data stores.

Handling Race Conditions

The system addresses race conditions by using watermarks to resolve conflicts. When multiple writes occur, the system compares watermarks to determine the correct sequence of operations. If a watermark is greater than another, the corresponding semantic object is written to the target data store. This approach ensures that data integrity is maintained across disparate data stores, even when handling concurrent data writes.