Windows Server 2012 Live Migration Improvements
Live migration moves virtual machines from one location to another with no perceivable downtime to service delivery. It was first present in Hyper-V in Windows Server 2008 R2; there it could move the memory and process of a virtual machine from one host to another within the same Hyper-V cluster. The virtual machines' storage did not move, instead, it stayed static on the shared cluster storage.
Recently Windows Server 2012 added signification improvements to Live Migration, including the following.
- Simultaneous live migration: More than one virtual machine could be moved at a time.
- Queuing: Clusters can queue migrations if more virtual machines need to move than Live Migration slots are available.
- Storage live migration: Any or all of the storage of a virtual machine can be moved from one storage location to another, such as from DAS to SAN to SMB 3.0 storage.
- Clusters are not required: Failover clustering is not a requirement for WS2012 Hyper-V live migration. You can perform live migration between any two WS2012 or Hyper-V Server 2012 hosts, clustered or not, as long as they share a common live migration network. Just make sure that your virtual machines will remain connected to a virtual switch with the same name (or the same logical network/switch in System Center Virtual Machine Manager).
- SMB 3.0 live migration: Hosts, clustered or not, can live migrate virtual machines that are stored on common SMB 3.0 shares without the movement of files. This can greatly reduce the costs of virtualization for large hosting facilities, and introduce quick inter-cluster live migration.
- Shared-nothing live migration: Two hosts can perform live migration of a virtual machine with no common storage. The process will move the files of the virtual machine using storage live migration before copying, synchronizing and switching over the memory/process of the virtual machine.
Why Use Windows Server 2012 R2 Hyper-V Live Migration?
That covers the “how” of live migration but not the “why” of live migration. The purpose of this feature, just like with VMware vMotion, is to provide agility and flexibility. The infrastructure needs the ability to adapt to the changing demands of services that are being delivered and the requirements of the business – without enforcing downtime on the business.
Thanks to live migration we can:
- React to changes in resource utilization: System Center Virtual Machine Manager (VMM) provides Dynamic Optimization to load balance virtual machines across hosts within a Hyper-V cluster. It dynamically places virtual machines on the host with the most suitable available resources. Yes, Hyper-V and System Center do have an answer to vSphere DRS.
- Optimize Power Consumption: An extension to Dynamic Optimization in VMM is Power Optimization, in which virtual machines within a Hyper-V cluster can be centralized to run on fewer hosts. Using baseboard management controllers, the power of idle hosts can be manipulated to reduce data center power costs.
- Performance and Resource Optimization (PRO): System Center Operations Manager (PRO) can detect an issue on a clustered Hyper-V host and request VMM to react by using live migration to move virtual machines off of the affected host.
- Perform maintenance: Whether it is scheduled (such as Cluster Aware Update) or not we can drain a host of virtual machines ahead of any necessary host maintenance. Note that Failover Cluster Manager (Pause) and VMM (Maintenance Mode) offer facilities to perform this drain for us on clustered nodes.
- Implement a new network footprint: A large data center might want to install an entirely new network footprint. Virtual machines could be live migrated to new hosts in that footprint with no downtime. The changes to the IP ranges can be overcome by implementing Hyper-V Network Virtualization (HNV aka Windows Network Virtualization or WNV) to abstract the IP addresses of the virtual machines so that service communications are not interrupted.
Those are just some day-to-day examples of the many reasons we (or System Center on our behalf) would use live migration to move virtual machines around the data center. This all sounds fantastic. How could Microsoft top WS2012 live migration? Surely this would be impossible in such a short time frame? Well, Microsoft did surprise us when they announced the new live migration features in WS2012 R2.
Cross-Version Live Migration
Before we even talk about the coolest features of WS2012 R2 live migration we need to discuss how we get from WS2012 Hyper-V to WS2012 R2 Hyper-V. This topic, as with the rest of the discussion, includes the free Hyper-V Server. Roughly speaking, there are two generic ways to get your existing virtual machines on WS2012 Hyper-V to run on WS2012 R2 Hyper-V:
- Upgrade: This is an in-place upgrade of the host. There is a significant amount of downtime and it is reserved for non-clustered hosts because Windows failover clustering does not support mixed Windows Server versions in the same cluster.
- Migration: A new (or drained and rebuilt) host is prepared with WS2012 R2 Hyper-V and virtual machines are moved to it. This is the only option for clusters and has always had downtime. This time could be brief (by remapping CSVs using the Cluster Migration Wizard) or long (export/import virtual machines one-by-one).
Migration offers the least amount of downtime during an upgrade, but there is still downtime. And that's unfortunate for any service provider whether the customers are internal or external. It's bad enough if a new version of Windows Server is released every three years, but imagine all those engineering hours at 3 a.m. on a Saturday if Microsoft was to release a new version of Windows Server just 12 months after the last one and continue to keep up that pace... oh wait – they are doing that now.
This is why cross-version Live Migration was added to WS2012 R2 Hyper-V. With this feature we can build a host with WS2012 R2 (this could be an existing WS2012 host that is drained and rebuilt) and perform a one-way-only from WS2012 Hyper-V to WS2012 R2 Hyper-V Live Migration of virtual machines to the new host.
This does give us the ability to do a zero-downtime-to-service upgrade of our hosts or clusters. Don’t forget that you’ll need to deploy updated integration components to virtual machines that will require a reboot, but at least this can be automated and scheduled using something like System Center Configuration Manager (see collection maintenance windows).
Cross-version live migration for zero-downtime cluster rebuilds
There are limits to cross-version live Migration:
- One-way: This is a one-way journey to WS2012 R2 Hyper-V. There is no roll-back without restoring from backup.
- 2012-to-2012 R2: You can only do this from the 2012 generation of Hyper-V to the 2012 R2 generation of Hyper-V. Those who are running older generations such as 2008 or 2008 R2 should have kept up and will have no option but to have downtime.
- Storage Capacity: You can decommission CSVs in the old cluster only after they are drained of virtual machines. You will need sufficient storage capacity to provision new CSVs in a new WS2012 R2 cluster while you perform the migration from the old cluster.
Sadly, we cannot do an in-place upgrade of a cluster. Microsoft is aware of our request for this, and cross-version live migration will ease the pain somewhat, but many will choose to go with the CSV migration approach of the Cluster Migration Wizard because they have limited storage capacity.
Live Migration Performance Options
One of the nicest improvements of WS2012 (not restricted to just live migration) was the ability to use the full capacity of 10 Gbps or faster networking. Hosts have increasing levels of capacity and virtual machines are getting bigger and bigger (up to 1 TB RAM in WS2012/R2 Hyper-V). It takes time to (a) copy and (b) synchronize the memory of those virtual machines between two hosts and that can introduce delays in planned or unplanned emergency maintenance. Adding 10 GbE or faster that can be used by live migration greatly reduces that time, maybe getting virtual machines off of a failing host before an interruption to service occurs.
Microsoft recognized that there was a need to optimize live migration beyond the algorithm improvements in WS2012. Even non-optimized 10GbE might not be enough for massive hosts. Many customers have investments in 1 GbE networking and have no plans to upgrade soon. This is why Microsoft has given us three types of live migration in WS2012 R2:
- TCP/IP: The legacy method of Live Migration as found in WS2012 Hyper-V
- Compression: Data is compressed before transport
- SMB: The features of SMB 3.0 are used for Live Migration
These options can be found in the host settings in Hyper-V Manager:
Configuring live migration method in WS2012 R2 Hyper-V.
Live Migration Compression
The processor capacity of hosts is generally underutilized and Microsoft leverages this using compressed live migration. With this option enabled, live migration will use the spare processor capacity of the host to compress live migration on the source host and decompress it on the destination host. Hyper-V is very careful; it monitors the demands on the processor by higher priority tasks, such as virtual machines, and prioritizes their needs. So if there is no spare processor capacity, live migration traffic will not be compressed.
Live migration compression is the best option when you have 1 GbE networks. The improvements in live migration times are significant in the typical host where processor resources might be 25-33% utilized.
SMB Live Migration
Server Message Block (SMB) is the protocol used by Microsoft for file services. WS2012 introduced a new version called SMB 3.0. This added two significant new features to this data transfer protocol:
- SMB Multichannel: Using Receive Side Scaling (RSS), SMB Multichannel can transmit data over multiple parallel streams in a single NIC. SMB Multichannel can also send data over multiple NICs between the source and destination server with dynamic discovery and failover. This can include the ability to send multiple streams over multiple NICs. In other words, SMB 3.0 can send and receive over huge amounts bandwidth such as 2 * 10 GbE or more.
- SMB Direct: Processing huge amounts of bandwidth incurs a processor and latency cost. SMB 3.0 can use Remote Direct Memory Access (RDMA) enabled NICs (rNICs) to offload this processing to get faster data transfer with little processing cost.
WS2012 R2 Hyper-V can not only use SMB 3.0 networks for storing virtual machines on file servers (as with WS2012) but it also adds a new trick: WS2012 R2 Hyper-V can use these super-fast high-bandwidth networks to perform Live Migration. The recommendation from Microsoft is to use SMB-powered live migration when you have 10 Gbps or faster networking:
- Single NIC: SMB Multichannel will use RSS to fill the bandwidth and perform the migration more quickly.
- Multiple NICs: SMB Multichannel can use more than one NIC to double, triple, etc., the bandwidth that can be used by live migration.
- RDMA: If the host has rNICs then SMB Direct will greatly reduce the processor requirements of the data transfer and reduce communications latency.
Live migration is the fastest option – assuming you have 10 GbE or faster networking – and for many virtual machines this brings the time required for each virtual machine down to the lowest possible theoretical time. There is a certain amount of time required to “build up” the virtual machine on the destination host, copy/synchronize the memory, and then “tear down” the virtual machine on the source host. Networking can improve the copy/transfer of the virtual machine, but it has no impact on the “build up” or “tear down."
NIC Teaming
Most of the conversations about WS2012 R2 live migration have stopped by now. There is one other improvement in WS2012 R2 that affects live migration. NIC teaming has a new load balancing mode in WS2012 R2 called Dynamic. This type of traffic distribution uses flowlets to spread the inbound and outbound traffic of a single data transfer across all of the team members (physical NICs) of a NIC team. This means that a team of 1 GbE NICs with compressed live migration could spread their transfer across all of the team members, thus giving you the extra bandwidth effect that SMB Multichannel gives 10 GbE or faster NICs.
Note that NIC teaming and RDMA are incompatible.
Observing the Possibilities
In tests, we have seen the following:
- 1 GbE (including 1 GbE teams): Enable live migration compression. We have observed significant reductions in the time required for live migration over the legacy TCP/IP option.
- 10 GbE networking: Enable SMB live migration. The performance of this option is incredible. You can use the concepts of converged networks (using QoS and SMB Multichannel Constraints) to merge the cluster, live migration, and SMB 3.0 storage networks of a Hyper-V cluster to balance costs and performance.
- rNICs: Do not team your rNICs. This is because RDMA is incompatible with NIC teaming and Scale-Out File Servers require SMB Multichannel networks to be on different subnets. Instead, use the rNICs as cluster network 1 and cluster network 2 (and more if you are lucky enough to have sufficient rNICs) on two different subnets. Use SMB Multichannel Constraints to restrict SMB 3.0 to these networks.
No comments:
Post a Comment