US20260104818
2026-04-16
Physics
G06F3/0655
The discussed technology outlines a distributed data storage system employing erasure coding (EC) to enhance data storage efficiency and resiliency, even when the number of storage nodes is less than the total EC data-plus-parity fragments. An exemplary configuration includes a three-node system with an EC 4+2 scheme, which temporarily replicates incoming data to address node outages or disk failures. This setup ensures continuous read and write operations, with automatic healing of failed EC write attempts, transparent to users or applications. The system converts temporarily replicated data to EC storage once all nodes are operational, reclaiming previously used space.
The system is structured as a storage cluster with three nodes, applying an EC 4+2 configuration. This means four data fragments and two parity fragments are used, resulting in an EC-count of six. Despite having fewer nodes than the EC-count, the system maintains high resiliency through features like failure handling, resource migration, data healing, and space reclamation. These features optimize EC usage while ensuring fault tolerance. In some setups, EC is applied at the virtual disk level rather than across the entire system.
A storage proxy acts as an intermediary between client applications and the data storage appliance. It operates as a controller virtual machine, software container, or program on bare metal, depending on the configuration. The proxy masquerades as an iSCSI target, NFS server, or cloud storage resource, managing read and write operations during EC failure conditions. It creates an "EC virtual disk" within the system, invisible to client applications, which remain unaware of the underlying storage configurations and fault-handling mechanisms.
The storage proxy attempts to write data blocks intercepted from client applications, ensuring resiliency. When a write request is received, the proxy generates a request to a storage node indicating EC properties. The node applies EC to the data block, distributing six EC fragments across different physical disks. If all fragments are successfully written, the proxy acknowledges the write success to the client. In cases where a node is down, the system switches from EC mode to replication mode to continue accepting write requests, using a "RF3 virtual disk" with a replication factor of three.
In replication mode, the storage proxy sends a new write request specifying RF3 properties, aiming to store the unfragmented data block on three nodes. If a node is down, the block is written to the remaining nodes, meeting a quorum value of two for replicated writes. This ensures the data block is stored at a minimum of two nodes, allowing successful reads as long as one node is accessible. The system's ability to switch between EC and replication modes ensures data integrity and availability despite node failures.