As the DVX goes to general availability, we thought you might be interested in speed tests and comparisons for the first version of the product. The following sums it up from a couple different perspectives.
- DVX Aggregate Speed: faster than many AFA’s
- IOPS / Core: the key metric for scaling storage performance
- Storage Latencies: end-to-end, measured at VM vs storage component
- $/GB: often half of comparable, new-generation Hybrid arrays
DVX Aggregate Speed
- Speed per DVX host from local flash is up to 30K IOPS with random reads, as shown above. A 32 host cluster can do up to an aggregate 1M IOPS. In the chart above “# cores” means the number of cores per host allocated to DVX processing.
- NetShelf bandwidth scales to use most of the 10 GbE link or up to 800 MB/s. Writes go here after host acceleration.
DVX IOPS maximum varies with #/type of CPU core and #/type of SSDs, but aggregate IOPS maximum grows linearly with similarly configured hosts because host IO is isolated from other hosts. Maximum IOPS increases with server refresh cycles in sync with advances in rapidly evolving processor and flash technologies. The above tests were run on typical 2-socket Intel E5-V3 (Haswell) processors systems with at least 2 local flash drives.
It’s fair to assume all production reads are server-local because flash in the DVX model is cheap and abundant. Reads do not consume any physical network bandwidth, and they are not subject to SAN queuing delays.
IOPS / Core
The chart above shows the aggregate speed envelope from host flash based on the # of cores/host allocated to the DVX Hyperdriver (or up to 10 max cores in DiESL 1.0). You can leverage faster cores or more cores – or both. The flexibility now lies in the host and is not confined to the array.
IOPS / core can be similar to an All Flash Array (AFA). In a DVX, CPU-intensive AFA functions are performed by host compute, so they scale naturally. These include end/end dedupe, compression, checksums, space reclamation and even RAID for the NetShelf.
The server powered storage overhead is similar to the requirements of an AFA, and the DVX’s IOPS/core is actually similar to some AFAs. But with a DVX, you can adjust resources dynamically with each host avoiding the storage-price burden from big silos of compute, RAM and flash hidden in the array.
DVX IO latency depends on host flash selected. Per this discussion, server flash avoids network queuing delays, which impact true VM latency statistics, but which are ignored by arrays.
Datrium recommends all benchmarking for VMs to be run at the VM level. Otherwise, latency advertised by storage is a device-level statistic, and often only a small part of what your VM will see.
Comparison with array latencies*:
- For DVX in-host IO, the fastest PCIe flash device latencies for servers are below 30 microseconds, while standard SATA SSDs are below 70 microseconds. These will be local, unlike array latencies, they won’t be affected by network processing.
- DVX NetShelf controller NVRAM write latencies average sub-millisecond response. Unlike arrays, DVX NVRAM writes are post data reduction.
Hyperconverged servers typically haven’t provided benchmarks to set expectations of performance or latency. This may be due to the performance variability in their pooling approach, in which every write creates neighbor noise and a higher bandwidth tax than SANs; it’s clearly not because they’re shy.
The DVX, with our first NetShelf D12x4, is $125k (list) for 29 TB useable. At the median of a normal 2x – 6x range of Enterprise data reduction (dedupe and compression), that means 60 – 180 TB effective.
- Durable capacity is around $1 / GB effective at list price, similar to nearline drive shelves in hybrid arrays – not the even higher cost of the array controllers.
- Flash, purchased from commodity sources is currently less than 30¢ / GB to address all in-use data (with reduction), an order of magnitude lower than what you’d typically pay for array flash raw.(e.g., as seen on one major server vendor’s website just last week, a 960 GB read-optimized TLC drive to add to a server was $1132 (list) – actual street prices may be lower. Note that the DVX Hyperdriver manages a custom log-structured file system on flash with inline dedupe and compression, so these drives can last up to 10x longer than the TBW spec.)
In a DVX, flash for massive caches is lower priced than durable disk storage.
- The NetShelf includes unlimited host licenses for Hyperdrivers. So DVX host flash is literally a simple add-on, purchased independently by our customers at commodity.
- On host flash, the $/GB denominator (capacity) is expanded by the DVX through inline compression and deduplication. So in total, flash has a very low $/GB.
Collectively, this means a DVX with massive local host flash caches is often half the $/GB of newer, flash-optimized hybrid arrays with only 10% flash/disk ratio, because the most expensive parts, CPU and flash, are purchased more economically at the server side – one of the many advantages of a server powered storage solution.
* Thanks go out to our friends at Intel for letting us test these flash devices in our POC lab : http://ark.intel.com/products/88731/Intel-SSD-DC-P3608-Series-1_6TB-12-Height-PCIe-3_0-x8-20nm-MLC. The performance of these devices helped us achieve some of the results discussed in this blog.