The Serious Business of VM Analytics

We here at Datrium have made it our task to constantly add new metrics to better measure how field deployed systems are performing, then finding ways to visually represent and simplify performance management for our customers. That means the hard data behind VM analytics is serious business here. In fact, we have taken the opportunity to address the end-to-end VM performance management complexity problem for our customers. And what do we know? Read on.

Simplicity and manageability are the keys

In order for you to build a scalable infrastructure that is simple to manage, you must be able to easily measure and monitor how the system behaves at every level. The admin and developer care about the applications and VMs supporting the business, not the infrastructure details on which they operate.

We’ve learned that performance data collection and analytics have been designed in from the start with a VM/app focus. Plus you need robust analytics capabilities from the complete system view (vCenter) down to the individual VM (vdisk) elements. 

There are some key considerations when developing an end-to-end VM analytics capability. You need an architecture that provides you with the ability for greater depth and visibility of VM data path operations on both Compute Nodes and Data Nodes, as well as visibility to the networking connections in between. 

Effective focus of data collection and presentation

By providing visibility into the application VM storage performance at the VM vdisk, vSphere host, and vCenter collection, the administrator gains the right level of information to allow for meaningful action. Having this information readily available along with other VM management capabilities can help reduce specific, complex or proprietary storage expertise required to understand or make decisions.

Some examples of typical VM performance related administration tasks that are easily addressed are:

  • At the VM level – vMotion a VM either to a host with appropriate resource headroom or to isolate a particular workload
  • At the host level – choose to switch a host from standard Fast Mode CPU utilization to Insane Mode to double IO processing of VMs on that host
  • At the rack/cluster level – add more servers to a cluster to support greater IO workloads to scale with infrastructure needs

You need to be able to monitor the data access component from a VM centric perspective, with an end to end coverage of data path as seen by the VM, the host and the vCenter.

Time frame and timeliness of data analytics availability

Having a combination of both real-time (happening right now) as well as historical (the past hours, days, or longer) enable troubleshooting and trending analysis tasks to be easier to address. Storing sufficient data at the right level of detail and over the right time periods will enable this to happen. Coverage of activity with time slices from 5 minutes to 1 year, depending on the duration period covered, provide sufficient historical data without excessive accumulation of information on the system. Having readily available historical views helps promote trend analysis and operational adaptation.

Being able to retrieve the data with good query performance means it actually can be used. Nothing is worse than trying to troubleshoot a problem or analyze a situation and waiting – and waiting – for the data to be presented.  Having the data collected and rendered within the same system further reduces management complexity as there is no need for another analytics platform.

Getting the right granularity and breakdown of KPIs

Getting the right level and appropriate amount of data for analytics is as important as getting the right granularity and breakdown of the key performance indicators (KPIs) – things like latency, IOPS, throughput, queue depth, cache hit are essential to understanding IO behavior and servicing. The current user interface is rich with data points to help understand and manage both performance and capacity aspects of the underlying data sets. Add in the ability to filter and rank the data and you get a solution that provides faster operational analysis and problem identification.

So, there it is, what we know about solving the end-to-end VM performance management complexity problem. And we’re doing it with our DVX Rackscale System,which includes living and breathing, hard data.