5 Apr 2013

Working With Replicas in Hyper-V 3.0

It's no big secret that Hyper-V 3.0 is loaded with new features, but the feature that I am probably the most excited about has got to be Hyper-V replica. In case you aren't familiar with the new Hyper-V replica feature, it is a disaster recovery feature that gives smaller organizations the ability to protect themselves in a way that is similar to what large organization might use. In this article, I will explain what you can expect from this new feature, and how to use it.

Clearing the Confusion

One of the things that I have noticed ever since the Hyper-V replica feature was first announced is that there seems to be a lot of misconceptions about what the Hyper-V replica feature is and what it does. That being the case, I want to start out by trying to clear up some of the confusion.

One of the big problems with server virtualization is that a server crash can result in a major outage. In a physical data center if a server were to fail then the workloads that were running on that server would cease to be available. Obviously this would be an inconvenience, but it probably would not be a major catastrophe. That isn't necessarily the case in a virtual data center however. In a virtual data center a single physical machine often hosts multiple virtualized workloads. If a host server fails then all of the virtual machines that were running on the server also fail. That's why the failure of a single physical server can result in a major outage in a virtual data center.

The only way to prevent this type of outage from occurring is to use clustering. In a clustered environment a group of host servers work together to provide redundancy. If a host server were to fail then the virtual machines that were running on that host server can failover to another server within the cluster. That way the virtual machines can keep running in spite of the hardware failure.

Failover clustering is essential to keeping virtual machines online in a production environment. In fact, Microsoft has supported the use of failover clustering for virtual machines ever since Hyper-V was first introduced with Windows Server 2008.

Failover clustering still exists with Windows Server 2012 and Hyper-V 3.0, but it has evolved significantly since its first incarnation several years ago. It is this evolution that seems to be contributing too much of the confusion around the Hyper-V replica feature. Hyper-V 3.0 contains a number of different redundancy features. The reason why Microsoft has introduced so many different features for protecting virtualized workloads is that some features are better suited to larger environments, while others are better suited to smaller organizations.

With that said, failover clustering and the Hyper-V replica feature appear very similar on the surface, but have completely different purposes. Failover clustering is designed to do two different things. First, it protects the virtual machines against hardware failures. As I previously explained, if a clustered node fails then the virtual servers can be failed over to another node in the cluster where they can continue to run.

The other thing that failover clustering is good for is facilitating system maintenance. Just like any other type of server, virtualization hosts need maintenance from time to time. For example, you might occasionally need to install a service pack or upgrade a host server's memory. Failover clustering allows maintenance to be performed on the host servers in a nondisruptive manner. Workloads can be live migrated to another node in the cluster so that the target node can be taken off-line for maintenance.

These are the two things that failover clustering is really good for. It has been possible to perform failovers ever since the days of Windows Server 2008. The ability to live migrated virtual machines for maintenance or load balancing purposes was introduced with Windows Server 2008 R2 and Hyper-V 2.0.

I mentioned that failover clustering has been significantly enhanced in Hyper-V 3.0. The enhancement that has undoubtedly received the most attention is that unlike previous versions of failover clustering, Hyper-V 3.0 does not require the use of shared storage. Previously all of the nodes in a failover cluster had to be tied to a centralized storage pool so that all of the host servers in the cluster would have access to a common set of virtual hard disk files. With Hyper-V 3.0, this requirement goes away. You can still use shared storage if you want to, but it is now possible to build a failover clusters without shared storage. This greatly reduces the cost of failover clustering and puts it within reach of smaller organizations.

One of the other big enhancements that Microsoft made to failover clustering is that you can now create much larger clusters than were previously possible. Now a cluster can contain up to 63 nodes. Because clusters can now be so large and because there is no longer a requirement for shared storage, some organizations are beginning to look at the idea of geographically dispersed clusters. In other words, some of the cluster nodes can reside at an alternate data center so that the virtual machines can remain online even if the primary data center is destroyed.

At first this sounds a lot like the Hyper-V replica feature. The Hyper-V replica feature allows virtual machines to be replicated to a remote host. That way, up-to-date copies of the virtual machines exist in a safe place. As much as this sounds like a geographically dispersed cluster though, Hyper-V replicas do not use clustering.

As previously mentioned, one of the big reasons for creating a cluster is so that virtual machines can keep running even if a host server fails. The Hyper-V replica feature does not allow virtual machines to be automatically failed over to a remote data center as is possible with a true clustering solution. Instead, the remote site is only used for safe keeping of a copy of the virtual machines. If necessary the replica can be mounted and used, but doing so requires manual intervention. It is not an automatic process.

So why would anyone use the Hyper-V replica feature as opposed to building a failover cluster that spans multiple data centers? For one thing, using the replica feature is less expensive than building a distance cluster. When you build a cluster that spans multiple data centers you typically need multiple cluster nodes in each location in order to facilitate data center level failovers and to prevent split brain syndrome across the data center boundaries. In contrast, the Hyper-V replica feature can be used with only a single server in the remote site.

Another important distinction is that the Hyper-V replica feature uses asynchronous replication. This makes it ideal for low bandwidth environments or for use across an unreliable Internet connection. In contrast, failover clustering requires that heartbeat information be exchanged among cluster nodes in a timely manner. This is the only way that the nodes in the cluster are able to determine which cluster nodes are up and running at any given moment. Clustering is sensitive to latency, whereas the Hyper-V replica feature is much more forgiving in high latency environments.

No comments:

Post a Comment