Invention Title:

USING ERASURE CODING ON STORAGE NODES FEWER THAN DATA PLUS PARITY FRAGMENTS IN A DISTRIBUTED DATA STORAGE SYSTEM

Publication number:

US20260104818

Publication date:
Section:

Physics

Class:

G06F3/0655

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The discussed technology outlines a distributed data storage system employing erasure coding (EC) to enhance data storage efficiency and resiliency, even when the number of storage nodes is less than the total EC data-plus-parity fragments. An exemplary configuration includes a three-node system with an EC 4+2 scheme, which temporarily replicates incoming data to address node outages or disk failures. This setup ensures continuous read and write operations, with automatic healing of failed EC write attempts, transparent to users or applications. The system converts temporarily replicated data to EC storage once all nodes are operational, reclaiming previously used space.

System Configuration and Features

The system is structured as a storage cluster with three nodes, applying an EC 4+2 configuration. This means four data fragments and two parity fragments are used, resulting in an EC-count of six. Despite having fewer nodes than the EC-count, the system maintains high resiliency through features like failure handling, resource migration, data healing, and space reclamation. These features optimize EC usage while ensuring fault tolerance. In some setups, EC is applied at the virtual disk level rather than across the entire system.

Role of the Storage Proxy

A storage proxy acts as an intermediary between client applications and the data storage appliance. It operates as a controller virtual machine, software container, or program on bare metal, depending on the configuration. The proxy masquerades as an iSCSI target, NFS server, or cloud storage resource, managing read and write operations during EC failure conditions. It creates an "EC virtual disk" within the system, invisible to client applications, which remain unaware of the underlying storage configurations and fault-handling mechanisms.

Handling Write Operations and Failures

The storage proxy attempts to write data blocks intercepted from client applications, ensuring resiliency. When a write request is received, the proxy generates a request to a storage node indicating EC properties. The node applies EC to the data block, distributing six EC fragments across different physical disks. If all fragments are successfully written, the proxy acknowledges the write success to the client. In cases where a node is down, the system switches from EC mode to replication mode to continue accepting write requests, using a "RF3 virtual disk" with a replication factor of three.

Replication Mode and Data Retention

In replication mode, the storage proxy sends a new write request specifying RF3 properties, aiming to store the unfragmented data block on three nodes. If a node is down, the block is written to the remaining nodes, meeting a quorum value of two for replicated writes. This ensures the data block is stored at a minimum of two nodes, allowing successful reads as long as one node is accessible. The system's ability to switch between EC and replication modes ensures data integrity and availability despite node failures.