Effortless Infrastructure for Private Cloud Takes A Big Step Beyond Hyperconvergence

Today marks the day that a big step forward has taken place in the rapid expansion in effortless infrastructure for private cloud due to Datrium’s debut of its DVX Rackscale Systems. These first-of-their-kind turnkey clusters showcasing why Open Convergence is overtaking hyperconvergence in a big way.

There are two standard reactions to private cloud infrastructure modernization as public cloud grows.  Either you can freeze your SAN the way people have done with mainframes–with the occasional component upgrade but without helping OpEx—or you can focus on Opex improvement.  The latter will drive you to convergence and commodity systems.

This pretty much means you’re likely to think about Open Convergence as represented by the Datrium DVX, or Hyperconvergence (HCI).   With the introduction of DVX Rackscale Systems, both Open Convergence and HCI offer converged clusters of compute and storage using commodity hardware.  I thought it might be worth a quick review of what DVX brings that remains very different from HCI.

There are other vendor differences up and down the stack; for example, see also our new Data Cloud software discussions.  DVX has become something like a host-based scalable flash system integrated with one of the world’s leading cloud data management platforms directly for persistence.  But lets start with the core.

Separate is not equal

Open Convergence means clear definitions of node types of commodity systems to enable performance isolation.  The node types are configured differently to meet different goals.

The DVX has two: Compute Nodes (CN) and Data Nodes (DN).   The Compute Node is where performance happens. It writes to Data Nodes for persistence. Compute Node performance is isolated from other Compute Nodes, so when one is down for maintenance, it doesn’t affect the others. The Compute Node is where performance happens.  Erasure coding for the Data Node, dedupe, compression, encryption, flash reads, etc. happen on the Compute Node so performance scales with workload.

Data Nodes just persist data and coordinate, so they have redundant hardware, batteries for RAM power protection, etc.; commodity, but configured for a specific role.  The minimum DVX configuration is 1 Data Node and 1 Compute Node.

In HCI, all hosts can write to all hosts in the cluster.  In DVX, it’s just 2 points: a Compute Node writes to a Data Node.

Figure 1: HCI Write IOs between nodes

Figure 2: DVX Write IOs

This architectural separation, putting a network between the data services’ CPU and durable shared data, is an industry first for VM infrastructure and still unique.  The DVX implementation also encourages giant local flash caches by (1) allowing it, (2) not charging software fees for it, and (3) doing inline dedupe and compression on it without write buffers or replicas from other hosts to take space away.  This has made DVX the modern convergence platform of choice for demanding SQL workloads. It’s now available pre-configured in Rackscale for maximum simplicity across the system lifecycle.

No silos

The real ‘get out of jail free’ card of Open Convergence is the continued full embrace of 3rd party servers via DVX Software.  Have some blade or 4-socket servers you want to integrate in the cluster?  No problem.

We took a further step in our DVX Software 2.0 release to smooth the road to white-box network switching.  With Adaptive Pathing, we eliminate the need to configure LACP to get multi-link aggregation for bandwidth and failover management.  The Compute and Data Node software dialog plays both ends against the middle, so the switch doesn’t have to do as much.

Hyperconverged is over converged

By contrast, in HCI, you have to pay a lot more attention and set up a rackscale config very deliberately.  At rackscale, HCI administration stops being invisible.  Simple at 3 nodes is different at 30.

If the goal is to be as effortless as possible, clear node roles (as now done in hyperscale clouds, and DVX) is just simpler.

  • If everything is pooled like HCI, you need additional settings to unpool for distinct use models.
  • When you troubleshoot or tune HCI storage issues, everything is everywhere, so nothing is quite anywhere. You have to look at N physical spots instead of 2.  Too often, people give up and just have to buy more nodes or worse, redesign their network for all the new east/west traffic.
  • Some HCI vendors advertise that it’s simple to have one node type. They may actually have 10,000 configurations available, they often say all members of a cluster need to be identical, and this gets weird over time as components change.  Change management and sizing becomes complicated compared to a DVX.

And…it’s a silo, in any current implementation.