Computer Enterprise: December 2013

4 Dec 2013

Optimize Virtual Environment

Although memory is often referred to as the most important hardware resource in a virtual data center, it is typically storage that has the biggest impact on virtual machine performance. Microsoft Hyper-V is extremely flexible with regard to the types of storage it can use, but administrators must be aware of a number of feature-related limitations and requirements for support. This article is intended to familiarize you with various Hyper-V storage best practices.

Minimizing virtual machine sprawl

One issue that virtualization administrators must routinely deal with is virtual machine (VM) sprawl. Microsoft's licensing policy for Windows Server 2012 Datacenter Edition, and tools such as System Center Virtual Machine Manager, have made it too easy to create VMs; if left unchecked, VMs can proliferate at a staggering rate.

The problem of VM sprawl is most often dealt with by placing limits on VM creation or setting policies to automatically expire aging virtual machines. However, it is also important to consider the impact VM sprawl can have on your storage infrastructure.

As more and more VMs are created, storage consumption can become an issue. More often however, resource contention is the bigger problem. Virtual hard disks often reside on a common volume or on a common storage pool, which means the virtual hard disks must compete for IOPS.

Although there isn't a universally applicable, cheap and easy solution to the problem of storage resource contention, there are a number of different mechanisms Hyper-V administrators can use to get a handle on the problem.

Fighting resource contention with dedupe

One of the best tools for reducing storage IOPS is file system deduplication. However, there are some important limitations that must be considered.

Microsoft introduced native file system deduplication in Windows Server 2012. Although this feature at first seemed promising, it had two major limitations: Native deduplication was not compatible with the new ReFS file system; and native deduplication was not supported for volumes containing virtual hard disks attached to a running virtual machine.

Microsoft did some more work on the deduplication feature in Windows Server 2012 R2 and now you can deduplicate a volume containing virtual hard disks that are being actively used. But there is one major caveat: This type of deduplication is only supported for virtual desktops, not virtual servers.

Deduplication can reduce IOPS and improve performance for Hyper-V virtual servers, but the only way to realize these benefits in a supported manner is to make use of hardware-level deduplication that is completely transparent to the Hyper-V host and any guest operating systems.

Managing QoS for effective storage I/O

Another tool for reducing the problem of storage I/O contention is a new Windows Server 2012 R2 feature called Quality of Service Management (formerly known as Storage QoS). This feature allows you to reserve storage IOPS for a virtual hard disk by specifying a minimum number of IOPS. IOPS occur in 8 KB increments. Similarly, you can cap a virtual hard disk's I/O operations by specifying a maximum number of allowed IOPS.

The Quality of Service Management feature is set on a per-virtual-hard-disk basis rather than a per-VM basis. This allows you to granularly apply Quality of Service Management policies in a way that gets the best possible performance from your available IOPS.

Considerations for Windows Storage Spaces

Microsoft introduced Windows Storage Spaces in Windows Server 2012 as a way of abstracting physical storage into a pool of storage resources. You can create virtual disks on top of a storage pool without having to worry about physical storage allocations.

Microsoft expanded the Windows Storage Spaces feature in Windows Server 2012 R2 by introducing new features such as three-way mirroring and storage tiering. You can implement the tiered storage feature on a per-virtual-hard-disk basis and allow "hot blocks" to be dynamically moved to a solid-state drive (SSD)-based storage tier so they can be read with the best possible efficiency.

The tiered storage feature greatly improves VM performance, but there are some limitations. The most pressing one is that storage tiers can only be used with mirrored virtual hard disks or simple virtual disks. Storage tiers cannot be used with parity disks, even though this was allowed in the preview release.

If you are planning to use tiered storage with a mirrored volume, then Windows requires the number of SSDs in the storage pool to match the number of mirrored disks. For example, if you are creating a three-way mirror then you will need three SSDs.

When you create a virtual hard disk that uses storage tiers, you are able to specify the amount of SSD space you wish to allocate to the fast tier. It is a good idea to estimate how much space you will need and then add at least 1 GB to that estimate. The reason for this is that if sufficient space is available, then Windows will use 1 GB of the fast tier as a write-back cache. This cache helps smooth out write operations (thereby improving write performance) by taking 1 GB of space away from your fast tier. If you account for this loss up front, you can allocate enough space to accommodate both the write-back cache and the hot storage blocks.

ReFS limitations

In Windows Server 2012, Microsoft introduced the Resilient File System (ReFS) as a next-generation replacement for the aging NTFS file system, which also exists in Windows Server 2012 R2. Hyper-V administrators must consider whether to provision VMs with ReFS volumes or NTFS volumes.

If you are running Hyper-V on Windows Server 2012, then it is best to avoid using the ReFS file system, which has a number of limitations. Perhaps the most significant of these (at least for virtualization administrators) is that ReFS is not supported for use with Cluster Shared Volumes.

In Windows Server 2012 R2, Microsoft supports the use of ReFS on Cluster Shared Volumes, but there are still limitations that need to be taken into account. First, choosing a file system is a semi-permanent operation. There is no option to convert a volume from NTFS to ReFS or vice versa.

Also, a number of features that exist in NTFS do not exist in ReFS. Microsoft has hinted that such features might be added in the future, but for right now, here is a list of what is missing:

    File-based compression (deduplication)
    Disk quotas
    Object identifiers
    Encrypted File System
    Named streams
    Transactions
    Hard links
    Extended Attributes

With so many features missing, why would anyone use ReFS? There are two reasons: ReFS is really good at maintaining data integrity and preventing bit rot, and it is a good choice when large quantities of data need to be stored. The file system has a theoretical size limit of 1 yottabyte.

If you do decide to use the ReFS file system on a volume containing Hyper-V VHD or VHDX files, then you will have to disable the integrity bit for those virtual hard disks. Hyper-V automatically disables the integrity bit for any newly created virtual hard disks, but if there are any virtual hard disks that were created on an NTFS volume and then moved to an ReFS volume, the integrity bit for those virtual hard disks need to be disabled manually. Otherwise, Hyper-V will display a series of error messages when you attempt to start the VM.

You can only disable the integrity bit through PowerShell. You can verify the status of the integrity bit by using the following command:

Get-Item <virtual hard disk name> | Get-FileIntegrity

If you need to disable the integrity bit, do so with this command:

Get-Item <virtual hard disk name> | Set-FileIntegrity –Enable $False

Best practices for storage connectivity

Hyper-V is extremely flexible with regard to the types of storage hardware that can be used. It supports direct-attached storage, iSCSI, Fibre Channel (FC), virtual FC and more. However, the way that storage connectivity is established can impact storage performance, as well as your ability to back up your data.

There is an old saying, "Just because you can do something doesn't necessarily mean that you should." In the world of Hyper-V, this applies especially well to the use of pass-through disks. Pass-through disks allow Hyper-V VMs to be configured to connect directly to physical disks rather than using a virtual hard disk.

The problem with using pass-through disks is that they are invisible to the Hyper-V VSS Writer. This means backup applications that rely on the Hyper-V VSS Writer are unable to make file, folder or application-consistent backups of volumes residing on pass-through disks without forcing the VM into a saved state. It is worth noting that this limitation does not apply to virtual FC connectivity.

Another Hyper-V storage best practice for connectivity is that whenever possible, establish iSCSI connectivity from the host operating system rather than doing it inside the VM. The reason for this is that depending on a number of factors (such as the Hyper-V version, the guest operating system and the Integration Service usage), storage performance can suffer if iSCSI connectivity is initiated from within the VM due to a lack of support for jumbo frames.

Troubleshooting VMware performance

A prerequisite to troubleshooting VMware storage performance is confirming whether storage or its infrastructure is the problem. While there are many sophisticated tools available to monitor the virtual environment, a simple and free way to make this determination is to monitor host CPU and virtual machine (VM) CPU utilization over time. Essentially, you want to know what the utilization of the CPU resource is when the performance problem is most noticeable. If the utilization is above 65%, it's more than likely that the performance problem can be best solved by upgrading the host, allocating more CPU resource to that particular VM or moving the VM to another host.

A simple way to rule out a CPU-related performance issue is to migrate the VM to a more powerful host with more memory, if possible. Assuming the alternate host is on the same shared storage infrastructure, a repeat in performance loss on a second host certainly begins to make storage performance a top candidate for resolving the issue.

One of the prime benefits that virtualization offers is its role in isolating performance problems. In the past, moving an application to another host meant acquiring server hardware, installing the operating system and application, and then migrating users. With virtualization, a simple vMotion can provide a lot of information in the troubleshooting process.

Targeting the storage network

Once a performance problem has been better isolated to the storage infrastructure, the next step is to determine where it's occurring in that infrastructure. Conventional wisdom (and storage vendors) says to "throw hardware" at the problem and buy more disk drives, solid-state drives (SSDs) or a more powerful storage controller. While a faster storage device may be in order, IT planners should first look at the storage network between the VMware hosts and the storage system. If a network problem exists, it doesn't matter how fast the storage devices in the system are.

A simple way to determine a network performance issue is to look at disk performance. Assuming CPU utilization is low, a storage device performance issue should show a relatively steady state of IOPS, which means disk I/O has hit a wall. Occasional high spikes or sporadic spikes in disk I/O performance means the device and storage system have performance to spare, but data isn't getting to them fast enough. In other words, there is a problem in the network.

IT professionals tend to focus on overall bandwidth as the biggest area of contention in the storage network -- for example, when moving from a 1 Gigabit Ethernet (GbE) environment to 10 GbE, or from 4 Gb Fibre Channel (FC) to 8 Gb FC. While an increase in bandwidth can improve performance, it's not always the main culprit. Other problem areas, like the quality and capabilities of the network card or the configuration of the network switch, should also be considered at the outset. Resolving issues at these levels is often far less expensive.

Network interface cards (NICs), whether they're FC- or Internet Protocol-based, are typically shared across multiple VMs within a host. Even multi-port cards are typically aggregated and shared. If a particular VM has a performance problem, dedicating that VM to its own port on a card -- or even its own card -- may be all that's needed to resolve the performance problem. If the decision is made to upgrade the NIC to a faster speed, look for cards where specific VM traffic can be isolated or provided a certain Quality of Service.

You can also upgrade the NIC without upgrading the rest of the network. While it may seem counterintuitive, placing a 16 Gb FC card into an 8 Gb FC network does two things: It lays the foundation for faster storage infrastructures, and it improves performance even over the old cabling. This is because the processing capabilities of the interface card become more robust with each generation. To move data into and out of a NIC requires processing power, so the faster this can occur, the better the performance of that card.

Switches can get overwhelmed

The second area of the storage network to explore is the switch. Just like a card, a switch can be overwhelmed by the amount of traffic it has to handle; many switches on the market weren't even designed for a 100% I/O load. For example, some switch designers may have counted on some connections not needing full bandwidth at all times. So while a switch may have 48 ports, it can't sustain full bandwidth to all ports at the same time. In fairness, in the pre-virtualization days, this was a safe practice. In the modern virtualized infrastructure, however, the thought of idle physical hosts is no longer practical.

Another common problem in switch configuration is inter-switch links. It's not uncommon as switch infrastructures get upgraded to find inter-switch connections hard set to their prior network speed. This configuration error essentially locks switch performance to its older performance level.

Looking for trouble in the storage controller

If disk performance measurements show a relatively steady state and CPU utilization is low, then it's more than likely that there's a problem with the storage system. Again, most tuning efforts tend to focus on the storage device, but the storage controller should be ruled out first. The modern storage controller is responsible for getting data into and out of the system, providing features like snapshots and managing RAID. In addition, some systems now perform even more sophisticated activities, such as data tiering between SSDs and hard disk drives (HDDs).

There are two parts of the storage controller that must be ruled out: the network interconnect between the controller and the drives, and the processing resource. Most storage systems will provide a GUI interface that will display the relevant statistics. It's important to monitor them during the problem period to determine if either one of these are the source of the problem. In the past, these two resources were seldom a concern, but in a virtualized data center, it's not uncommon. Also, if and when SSDs are installed in the storage system, it's important to recheck those resources to ensure they're not blocking the SSD from reaching its full potential.

Analyzing the storage device

After all this triage is done, the storage device can finally be analyzed. It's important to note that most storage tuning efforts start here, when in actuality this is where they should end. Having a fast storage device without an optimized infrastructure is a waste of resources. That said, the above modifications (host CPU, storage network and storage controller), while not optimal, will often prove to be acceptable. The easiest way to confirm a disk I/O performance problem is when your measurement tool shows a consistent performance result. For example, if IOPS is consistently reporting in the same range while CPU and network utilization are low.

The fix for device-based performance problems is typically to add additional drives or to migrate to SSD. In the modern era, a move to SSD is almost always more beneficial, providing a better performance improvement for less expense. However, before shifting to more or faster drives, IT professionals should also look at how the VM disk files are distributed. Too many on a single volume can be problematic; moving them to different HDD volumes can help. In the end, SSD should also solve the problem.

Tuning the VMware environment is a step-by-step process. Before you upgrade to high-storage devices, you should go through the above process to ensure your VMware environment will see the maximum benefit from your investment.

Get ready for the 12 Gbps SAS drive

The rollout of a 12 Gbps SAS drive and other technology will mean performance improvements and new management capabilities. Marty Czekalski, president of the SCSI Trade Association, discussed the impact that 12 Gbps SAS will have on IT organizations. Czekalski also does ecosystem development for interface technologies and protocols, and works with standards organizations in his position as manager of the emerging technology program at hard disk drive vendor Seagate Technology LLC.

Where does the rollout of 12 Gbps SAS drive technology stand?

Marty Czekalski: It's just starting up the ramp now, and we've been doing plugfests like the one [from Oct. 21-26] at the University of New Hampshire. We've done a lot of work in wringing out all the bugs. The plugfest went very smoothly, so we're looking forward to a relatively event-free rollout. Our first 12 Gb plugfest was held last year, where people brought their prototypes, and it's usually about 12 to 18 months before end-user shipments start.

Usually, the first place those show up is [in] add-in cards and servers, which is where they're showing up now. If you have a specific need for the performance today, you can get 12 [Gbps] SAS host bus adapters, and there are some 12 [Gbps] SAS SSDs [solid-state drives]. You'll see over the next several months [that] the different OEMs will start offering these as add-ons to their servers.

Today, the servers are shipping typically with 6 [Gbps SAS] on the motherboard. If people want the extra performance, they buy an add-in card for 12 [Gbps]. You'll start to see 12 [Gbps] SAS RAID controllers down on the motherboard starting midyear next year, with the new server shipments typically as a standard feature.

Are there other ways to implement 12 Gbps SAS drives?

Czekalski: There's a lot of different ways you can implement it. As I said, add-in cards. It'll be down on the motherboard, and you'll be able to plug in 12 Gbps SSDs or [hard disk drives] HDDs. You'll have external storage systems that are connected with 12 [Gbps] SAS as well as Fibre Channel [FC] subsystems that'll be FC on the front end and the back ends will be 12 Gbps SAS to the actual storage devices. People who are rolling out FC or network-connected storage systems are already migrating their back ends to 12 [Gbps] SAS so they can scale out those storage systems on the back end more effectively and into larger configurations.

What benefits will enterprise IT users get from 12 Gbps SAS drive technology?

Czekalski: It doubles the user transfer rate. The other thing that's coming into play now is more of the host bus adapters [HBAs] will be using the Mini-SAS HD connector, which is a managed connector. It allows a single HBA to accept different kinds of cables. So, you can go with a passive cable, say, up to six or eight meters. You could use an active copper cable by just changing the cable. You can go up to 20 meters with that. Or, you can even go with optical cables up to 100 meters.

In addition, these are managed cables with that connector, so when it's plugged into the system, the system actually knows what kind of cable, what kind of distance it's going. The system can adapt its signaling to optimize it for that particular type of cable. If there's a cable problem for some reason, and the system detects it, it can actually say cable number x, part number y needs replacement. It becomes easier to manage from an IT standpoint.

How was support for that capability built into the system?

Czekalski: It's built right into the connector and cable. The connector on the host bus adapter or on the server is the same in all cases. It's the same low-cost SAS RAID controller or HBA. If you need to go extra distances, you buy a cable that's appropriate. For short cables that are passive, they're very inexpensive. If you need to go 100 meters, you buy an active optical cable, and you plug it in. So, the active components are actually in the connector housing that's on the cable. And in addition to just having the active components, there's also an EEPROM [Electrically Erasable Programmable Read-Only Memory] in there that the system can read that tells exactly what kind of cable it is, its characteristics, part numbers and a bunch of different things.

How do systems based on 6 Gbps SAS work now?

Czekalski: The 6 Gb with the original Mini-SAS connector didn't have any manageability. It didn't have the extra pins to be able to read what kind of cable it was. You had your basic cables for doing six to 10 meters, and then there were some specialty cables and systems built that could take an active copper cable. But there was no way for the system to know exactly what kind of cable was plugged in. With the Mini-SAS HD, there are extra pins for the purposes of providing power to the connector for the active components in both the copper and optical, as well as power to the EEPROM that's in there so the data can be read out.

How will 12 Gbps SAS SSDs stack up performance-wise against PCI Express (PCIe)-based SSDs?

Czekalski: I think you're going to see less extremes and performance differences. And keep in mind that SAS SSDs have different features than a PCIe SSD. The SAS SSDs are dual-ported and connected into a fault-tolerant domain, and all the software is already there and works for failover and high availability [HA]. That feature isn't yet there for PCIe. Yes, they designed the connectors and stuff so that could be done, but that whole ecosystem for being able to do the HA stuff doesn't exist today for PCIe. The other thing that doesn't exist today is the same kind of hot-plug model in PCIe. If you were to go over to most of these PCIe devices and unplug it, you're going to get a blue screen as opposed to a SAS device. SAS is hot-pluggable. You can do surprise plugs and unplugs, and the system doesn't mind it because it's a storage interface that was designed for that. PCIe wasn't originally designed to be hot plugged.

Will users see a substantial performance improvement with SAS hard disk drives?

Czekalski: You will see improvements there with 12 Gb, particularly on 12 Gb HDDs that are hybrid drives that have some flash embedded in them, because the flash can now run at the line rate directly. So, your reads that are coming out of the flash cache on a hybrid HDD will run at the line rate. You'll also get more scalability even if you're just running a traditional SAS HDD on 12 Gb. You can put more of them on the same bus without getting contention. So, you can create large configurations, and you reduce your number of HBAs. You can reduce the number of cables and other things. So, it actually reduces overall system cost and complexity while improving the performance.