What’s Wrong with Cloud Backup?

What’s Wrong with Cloud Backup?

Posted by: Brian Biles

 

You’d think the simplest offsite data protection service for on-prem computing would be the Cloud, right? Sadly, you’d be wrong, it’s not. Hybrid-cloud data migration for backup and DR can be expensive and a hassle.

Why? VMs and containers are mobile, but generally their data is not.  Specialty vendors are often required, and the kinks are still being worked out.  At a minimum its complex, and it can be a lot worse than that.

On the other hand, Cloud DVX, Datrium’s cloud-native services platform on AWS, uses DVX’s core architecture for efficient data protection to create an effortless, low cost and fast platform for data protection.  With this platform as a foundation, Cloud has every reason to become a first-class choice for native protection for on-prem data.

 

Background

Data gravity makes data migration expensive and slow.  While some data is too big to move easily, sometimes it needs to be moved anyway, e.g. for on-prem DR.

Datrium makes the simplest converged infrastructure for tier 1 performance, but unlike that market it’s integrated with state-of-the-art data protection granularity, scale and efficiency.  Joining these extremes together is unique in the market.  We want to automate efficient self-protection, so you don’t have to buy and operate a separate backup infrastructure.

As result, we get a lot of requests to use these capabilities to embrace clouds as peer datacenters for data protection.  In Datrium spirit, we saw this as an opportunity to raise the bar on simplicity versus any other approach – just add your credentials and everything else could be automatic. 

DVX software is already architected like IaaS services anyway, so porting was natural.  DVX Compute Node software fits simply in EC2, and Data Node software layers simply on S3.

 

Why there’s room for innovation

Datrium’s core differentiation is scalable data services to move instance data efficiently between hosts, local persistence pools, other sites or clouds by policy.  We are not building a hypervisor.  We’re not building a custom cloud.  We’re about simplifying big-ass pods of Tier 1 compute+storage on-prem, with scalable policies for data protection, while fighting data gravity to pre-position data efficiently across multiple sites. 


Hyperscale and Hypervisor
companies are also working on hybrid cloud functions, but they are focusing on instance mobility and programming environments, not Tier 1 app IO, on-prem consolidation, or efficient data services with any cloud but their own. 

Tells: This market sees on-prem data sharing consolidation boundaries as correlating to their hypervisor cluster boundaries.  They don’t see a problem offering RAID1 style drive protection – something no Tier 1 array vendor would have done for the last 10 years.

Microsoft is the biggest and most sophisticated player here, though even smaller vendors such as Nutanix are aiming for a niche.  Microsoft does many things right, and they have offered many kinds of data service approaches historically, but they never competed effectively for Tier 1 storage consolidation or backup.


Primary storage
generally lacks the key elements for hybrid cloud functions, so customers face the complexity of adding separate backup or cloud data management software. 

  • Snapshot methods should be instance-specific for a VM or container’s data, so that the instance can be the unit of migration or protection. Arrays weren’t built for the scale of snaps and level of granularity required.
  • Because always-on global dedupe and compression impacts performance in many primary storage architectures, it’s generally a layered option that hurts latency. Cloud data management can’t count on it being turned on. 
  • Encryption can’t handle data between hosts and arrays in flight without disrupting the data reduction arrays choose to offer. Many don’t have it in software; they use self-encrypting drives, which doesn’t help cloud or WANs. 
  • Some of the more advanced primary storage vendors are all-flash; for data protection, flash is still a lot more expensive per TB than disk.

As a result, customers are forced to add the complexity of separate data management vendors and disk-based mirrors. Backup vendors have early approaches, but they are not connected well to primary storage – RPOs are typically multi-hour, and RTO’s require image recovery or storage migration/vMotion.

 

Next Steps

Datrium has three offerings planned within 2018 that will set an exciting foundation for future work.


Cloud DVX
provides the world’s simplest cloud recovery service for a converged infrastructure.  With a Datrium system on-prem, just enter your AWS credentials.  Snaps will replicate for chosen Protection Groups to your Cloud DVX for storage on S3, and recovery is a simple request.  Everything else about it is automatic.

If several sites replicate to your Cloud DVX, data will dedupe across them into S3 to minimize storage fees.  Transfers will also dedupe on the wire: if a data segment exists on the destination, the data will not need to be resent, helping minimize egress fees.  Data will also be compressed and encrypted across these paths.  

Nothing could be simpler, and no vendor has better technology to reduce S3 and egress TCO for this kind of service.  In addition, because restores can be at vDisk granularity, and with dedupe on the wire, restore times (RTO) are much shorter than restore of a block storage volume.  There is also no time spent restoring to a backup store for later migration to a full performance store – recovery is direct to host.

We will keep extending VM backup granularity to include, for example, cataloging and recovery of individual guest files along with the vDisks, VMs, and Docker Persistent Volumes we currently support, allowing further granularity of recovery and minimization of egress costs.


Hybrid DR –
failover / failback orchestration for Protection Groups across sites, between on-prem or cloud DVXsThis will eliminate the need for VMware SRM for many cases, leveraging DVX native simplicity and economic snap depth, and enabling a push-button DR test. 

Prem to prem DR is simple if it involves the same hypervisor (the most common case).  The approach is not a stretch cluster; it is snap-based, so there are no synchronous replication write-latency issues.  RPO can be as small as 10 minutes.

We expect to offer this DR framework first for prem-to-prem DR in the second half of 2018, using SaaS-based orchestration.  It will support hybrid cloud failover/failback shortly thereafter.


Multi-cloud DVX global administration as SaaS
.  Cross-site analytics correlation and health software for network of DVX sites is currently in use by Datrium’s Support organization.  This is being migrated to a SaaS application for use by customers directly.

So, while the work is not yet complete, the underpinnings in place with Cloud DVX offer a better starting point to enable successful cloud-inclusive data protection strategies for Enterprises.  When we started Data Domain and said that tape sucked, so it was time for backup to disk, it was controversial, but now it seems obvious.  We hope that the cloud transition will unfold in a more progressive path.

y:inline;"> line;">