Why is Rack Scale Infrastructure So Hard?

Why is Rack Scale Infrastructure So Hard?

Posted by: Jeff Zabarsky

Virtualization has taken IT by storm to the point it’s now the way we all deploy our applications. Converged infrastructure is widely considered the best way to deploy virtualization. Industry figures back this up by showing robust 31%1 compound annual growth, while traditional infrastructure sales have flat lined or worse. The reason for the ascension of convergence is the promise of simplicity for hardware acquisition and scaling. The goal is to buy and manage infrastructure in pod units consisting of compute, networking, and storage. Data centers can scale to meet demand by simply replicating these pods.

The promise of simplicity is only partially fulfilled because the storage side of infrastructure hasn’t kept pace. In large part, this is because storage architectures have been around for decades and have solidified around certain baked-in concepts. Some of these concepts are desirable and provide great benefits, while others have outlived their usefulness. Newer storage designs that bring improvements have arisen in recent years, but toss out some of the hard-won advances and create new problems.

 

A brief and incomplete history of storage arrays 

Storage arrays as we know them today were born when storage moved out onto the network. That network was either a SAN (Storage Area Network) or a NAS (Network Attached Storage) over ethernet. In a nutshell the difference between these two types of arrays is NAS exports a filesystem containing files, while SAN exports raw disk space as an object called a LUN. The advantage of putting storage on the network was the ability to share the storage among multiple servers. Storage space could be pooled and protected with RAID techniques without having to statically allocate capacity.  With RAID6 style erasure coding using two parities, a drive could fail and data would still be protected against sector read errors on the remaining drives. Servers could be brought down for maintenance without impacting the other servers sharing the storage. Ownership of a server’s storage could even be shifted to another server to avoid availability disruptions.

The advancements of networked storage came with one well known limitation: The resources on the storage array are fixed and must be split across all the servers that share the storage. The key problem is the more servers you have communicating with an array and the more intense their communication, the less resources available for each individual server. If you add more disk shelves to an array, increase the number of servers, or just increase the amount of IO traffic generated by your servers, you will reach a point where you run out of resources and you need to upgrade the storage array controller to a more powerful model. That’s an expensive and inconvenient proposition.

 

Chewing gum, baling wire, and duct tape 

If you were to design a storage architecture from scratch and wanted to maximize performance, putting flash drives on the other end of a network connection wouldn’t rank as a first choice. Littles Law dictates that the more IO requests you send on a network, the higher the latency experienced because of queueing delays. Under load, network delays can amount to multiple milliseconds, while flash drives are capable of less than 100 microseconds of latency. That’s a dramatic difference in latency and a tremendous waste of potential performance. This problem is only getting worse with technologies like 3D XPoint that promise latencies in the neighborhood of 10 microseconds.

Storage arrays, by their very nature, live on the network so there is no real choice but to pay the network performance costs. This leads to a search for solutions like NVMe over Fabric where an entirely new network infrastructure is required to provide low latency high bandwidth connectivity. These are attempts to approximate the performance of attaching flash directly to the servers, but with very significant costs and complexity.

 

Past its sell-by date

We now live in a world where the fundamental objects we manage are VMs and Virtual Disks. Storage arrays weren’t built with these objects in mind. This creates an impedance mismatch between storage arrays and virtualization where we have two completely different domains with different management interfaces. We are forced to manually map multiple virtual disks to a single LUN or treat them as files within a NAS filesystem. Either way, when there are performance conflicts, there is no clear way to diagnose how the IO traffic from various VMs is interacting and competing for resources within the array. Space accounting can be difficult to manage. In the case of LUNs you may have to guess and move a VM’s storage to a different LUN to try to break up contention. The mismatch also carries over into snapshots and cloning. Storage arrays typically perform these functions at a whole LUN or NAS filesystem granularity, when what you really want is the ability to snapshot a VM or a defined group of VMs. You can forget about more advanced features like dynamic policy bindings and search capabilities.

Traditional storage arrays are a poor match for today’s virtualized IT infrastructure. They are resource constrained when servers and IO load are increased and difficult to scale. They throw away flash memory’s low latency performance by bottlenecking it behind a network connection. They have a hard time taking advantage of newer flash technologies like NVMe and 3D XPoint.

It should be clear that storage is ripe for a serious rethink. In the next blog we’ll look at a popular alternative and see how it fixes some of the problems we’ve discussed, but creates whole new headaches.

 

1 http://www.prnewswire.com/news-releases/converged-infrastructure-market-growing-at-31-cagr-to-2019-503059431.html

y:inline;"> line;">