Converging Primary and Secondary Storage Whitepaper



Introduction

Datrium DVX is an open converged infrastructure system that delivers high-performance primary and cost-optimized space-efficient secondary storage for private clouds with built-in VM-based data protection and efficient RPO and RTO. The integrated data protection eliminates the need for 3rd party backup and replication software and hardware. The unified data management reduces the manpower for private cloud operation. DVX can deliver millions of IOPS at sub-millisecond latencies rivaling the performance of the fastest all-flash storage arrays. DVX data ingest bandwidth scales linearly with the addition of new nodes competing with the largest specialized secondary storage appliances.

Legacy Storage Architectures

Legacy enterprise storage architectures are based on client/server era concepts and rely on multiple storage tiers composed of several types of specialized storage appliances and traditional Backup/Recovery (B/R) software. Primary and secondary storage have different requirements leading to high specialization: primary storage mainly addresses latency, IOPS, and data availability, while secondary storage is focused on data protection, ingest bandwidth, data reduction, and cost effectiveness.

In legacy architectures, primary storage needs are serviced by SANs, while secondary storage is provided by custom-built secondary storage appliances such as Data Domain. Both are complex systems with sophisticated management tools.

Data protection is managed by backup and disaster recovery software. Enterprise B/R software leverages multiple types of servers for overall orchestration and execution of specific backup tasks. Traditional B/R software runs once a day and delivers 24-hour RPO and RTO. Because this is often insufficient, complex storage mirrors based on array LUN replication are also frequently deployed. Dedicated DR sites are maintained to address entire primary site failures. However, these DR sites do not eliminate traditional B/R software because LUN replicas alone don’t provide full backup capabilities to satisfy regulatory and other requirements.

Tape is still used for extended and offsite data archiving. WAN accelerators compress and deduplicate backup traffic over slower WANs. Much of this storage infrastructure is also duplicated on the DR sites. Managing data protection has traditionally been one of the least appreciated, but most labor-intensive jobs in IT.

The overwhelming adoption of virtualization triggered convergence of compute and storage into HCI and the evolution of backup software. These newer systems offer data management at VM granularity. However, primary and secondary storage remain disparate with much of the secondary storage infrastructure unchanged. Enterprise storage remains complex and brittle with SLA challenges.

 

 

Effortless Infrastructure

The adoption of virtualization was followed by the emergence of private clouds. The invention and evolution of HCI and VM-centric backup demonstrated that some complexity of legacy architectures could be addressed by purpose built systems. However, even with HCI and VM-aware backups, enterprise IT still subsidizes the development of multiple distributed file systems and appliance tiers for HCI, backup software and backup storage. Backup activities generally do not contribute to company top line performance, but they are necessary to keep businesses running. Optimizing backup costs and manpower is an important TCO consideration.

An idealized converged infrastructure architecture for private clouds should provide:

  • High-performance primary storage with performance scaling and isolation, converged with commodity compute resources.
  • Cost optimized integrated secondary storage with rich built-in data protection, data management, and data reduction capabilities making 3rd party enterprise backup and disaster recovery software and hardware unnecessary.
  • A unified VM-aware data management framework for all compute and storage administration, across ESXi, Linux KVM and containers, and MxN replication.
  • RPO and RTO measured in minutes or seconds instead of days or hours.
  • An ability to tap into Public Clouds for extended data archiving.

This idealized architecture would eliminate the duplicate efforts and costs of building multiple distributed file systems, management frameworks, and hardware appliances to lower CAPEX. A unified architecture lowers OPEX by reducing the manpower needed to manage multiple systems via different management abstractions. A single IT administrator should be able to handle all private cloud compute and storage management using a unified management console.

A study by Dell EMC revealed that: Businesses using three or more vendors to supply data protection solutions lost three times as much data as those who unified their data protection strategy around a single vendor.[1]

[1] https://www.emc.com/about/news/press/2014/20141202-01.htm

 

DVX: Converging Primary and Secondary Storage

Datrium DVX was designed from the ground up to handle all storage and compute needs of private and hybrid clouds with a strong focus on VM-based data protection.

 

Unified Architecture

All storage and compute are serviced by just two kinds of hardware appliances: Compute and Data Nodes. Any HCL compliant commodity x86 server can serve as a Compute Node with the help of DVX software. Alternatively, Datrium Compute Nodes are offered as a turnkey solution. Compute Nodes can run either VMware vSphere, Red Hat Enterprise Linux/KVM or CentOS/KVM. Linux versions of the Compute Node also support the use of Docker Containers and the Datrium Persistent Volume plug-in.

In addition to executing VMs, Compute Nodes allocate a portion of their CPU, memory and flash resources to I/O processing. Compute Nodes support a wide range of read- and write-optimized SSDs and NVMe flash. Data Nodes provide durable storage. All management tasks are performed via a unified UI available as a vCenter plugin.

Software-Defined Storage Controller

The key part of the SDS Controller is a distributed log-structured file system that executes on all Compute and Storage Nodes, manages diverse storage media and presents uniform storage abstractions to an administrator. The scale-out log-structured design is critical for high performance primary I/O and high bandwidth data ingest and data reduction for secondary storage. Log-structured file systems are also a proven way to harness the cost effectiveness of read-optimized SSDs and avoid more expensive enterprise grade SSDs where it is unnecessary.  Host performance is isolated – hosts do not interact for normal IO, so each host can be configured as required.  This eliminates the need for HCI-like cluster restrictions or settings.

Converging Compute and Primary Storage

DVX can deliver millions of IOPS at sub-millisecond latencies rivaling the performance of the fastest all-flash storage arrays. Storage performance scales automatically with the addition of new Compute Nodes.  I/O processing is accelerated by tapping flash, CPU, and memory resources of Compute Nodes. An executing VM gets direct access to local SSDs without being encumbered by network hops achieving optimal latencies and IOPS. On-flash data is always compressed and deduplicated inline.

 

Extending Convergence to Secondary Storage

Data Nodes are dense cost-optimized fully-redundant x86 based storage appliances for data persistence. Data is always erasure-coded, compressed, and globally deduplicated (using host CPU resources). There are no knobs and no dials – data reduction is always enabled with no performance trade-offs. Unlike in most HCI systems, erasure coding, compression and deduplication take place during normal operation with no expensive post-processing. The scale-out log-structured file system is designed to accommodate data reduction: variable size compressed data is appended to a write log; full stripes are erasure-coded and written out utilizing the aggregate bandwidth of all disks in the cluster. Adding more Data Nodes linearly scales the data ingest bandwidth rivaling the biggest secondary storage appliances.

 

Data Protection and Management

DVX offers consistent data protection and unified data management across the entire private cloud environment. Fully integrated VM-centric data protection eliminates the need for 3rd party B/R software. Built-in cluster-level replication makes 3rd party replication unnecessary. DVX data protection and replication technology is significantly different from LUN level snapshots and replication of storage arrays.

Blanket Encryption

Datrium Blanket Encryption provides comprehensive encryption over the lifecycle of VM data, from its genesis on a Compute Node, through in-use storage on host flash, across networks, and persisted at rest on Data Nodes. It offers the flexibility of a software-based solution with virtually no impact on performance. It preserves all forms of data reduction: inline compression and deduplication for Compute Node flash, compression and global deduplication for Data Node durable storage and compression and deduplication of remote replication traffic.

Snapshots and Clones

DVX data protection is based on VM-level snapshots. Flexible backup policies are created by dynamically binding sets of VMs based on supplied grouping criteria expressed via pattern matching rules to enable data protection at scale. Different policies are applied to different sets of VMs providing a mechanism to tailor RPO to specific application needs. Snapshots preserve crash or, when configured, application level consistency across arbitrary sets of VMs executing on different Compute Nodes.

Each snapshot is a logical and self-sufficient copy of a set of VMs (and other artifacts) and serves as a full backup. Taking a snapshot is a fast metadata manipulation operation that involves no physical data copy. The file system maintains no complex snapshot delta chains that need to be replayed, degrade performance and lead to large data losses if corrupted. The system accommodates millions of snapshots without any performance impact with RPO measured in minutes and RTO in seconds reducing backup windows. Low RPO combined with large numbers of snapshots makes DVX very effective for virus and ransomware protection. A built-in Search Catalog facilitates backup restore via a simple UI and scalable automation. Snapshots may be deleted in any order.  The system also offers zero-copy VM and VMDK clones based on Redirect-On-Write (ROW) technology.

The native built-in data protection capabilities eliminate CPU and RAM tax imposed by 3rd party B/R software that competes with executing VMs for resources – as well as their separate training, maintenance and license fees.

Elastic Replication

Local backups are replicated to remote DR sites via an integrated feature called Elastic Replication. Elastic Replication is forever incremental – only incremental changes between snapshots are transferred. It shrinks data on-wire by preserving the native data format (data is transferred in a compressed file system format). It deduplicates the data against the destination and transfers only what’s not already present there. This eliminates the need for dedicated WAN Optimization appliances since all relevant features are already built-in. Because of the always-on on-wire data reduction, DVX is an excellent choice for replicating over slower or noisy WANs.

Elastic Replication uses adaptive software to discover source and destination cluster topologies and automatically configure all Compute and Data Nodes for replication and network monitoring. It supports varied WAN topologies from point-to-point to one-to-many, from bidirectional to true mesh. Elastic Replication manages multiple concurrent replication streams that transfer data directly from the local flash of source Compute Nodes to the destination Compute Nodes eliminating centralized bottlenecks of legacy storage arrays. WAN throttling schedules are configured to lower network bills. Replication traffic is secured by enabling encryption tunnels.

Elastic Replication is used to both copy snapshots to DR sites for data protection and to recover from remote snapshots stored by DR sites following a DR event.

DR Workflows

DVX provides support for different DR workflows: Planned and Unplanned Site Failover, Site Failover Test, etc. These workflows are managed via a unified management UI seamlessly integrated with vCenter.

Cloud Backup for Extended Archiving

In addition to replication between DVXs, snaps will be replicateable to AWS for extended archiving. This feature is also based on forever-incremental native replication preserving all on-disk and on-wire data reduction and all replication capabilities available for on-premise DVX backups. In fact, DVX itself is extended to run as an AWS instance and use cost effective S3 for durable storage.

 

End-to-End Integrity

DVX leverages its Compute Node software to checksum the written data immediately (before any network hops). This checksum is forever associated with the written data to guarantee its self-consistency. In addition to checksum checks, independent referential integrity checks are performed for each read from any persistent media. An integrated data scrubber continuously verifies data integrity and proactively fixes any detected problems. This ensures data availability of long-retention backups.

 

Summary

Datrium DVX delivers all primary and secondary storage for private clouds with built-in data protection and efficient RPO and RTO. This native data protection eliminates the need for any 3rd party backup and replication software and hardware. The unified data management reduces the manpower necessary for private cloud operation.

 

 

Download the Converging Primary and Secondary Storage

y:inline;"> line;">