Category Archives: 16097

Exchange 2010 support for host-based failover clustering and migration

Some Exchange-supported virtualization platforms, such as Hyper-V and VMware include features that support the clustering or portability of guest virtual machines across multiple physical root machines.  Examples of host-based failover clustering and migration include Hyper-V Live Migration and VMware ESX vMotion.

Microsoft support for host-based failover clustering and migration virtualization with Database Availability Groups (DAGs) depends on the Exchange 2010 service pack level.  Per the Exchange 2010 System Requirements:

With Exchange 2010 RTM:

Microsoft doesn’t support combining Exchange high availability solutions (such as DAGs) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers. DAGs are supported in hardware virtualization environments, provided the virtualization environment doesn’t employ clustered root servers, or the clustered root servers have been configured to never failover or automatically move mailbox servers that are members of a DAG to another root server.

With Exchange 2010 SP1 (or later) deployed:

Exchange server virtual machines (including Exchange Mailbox virtual machines that are part of a DAG), may be combined with host-based failover clustering and migration technology, as long as the virtual machines are configured such that they will not save and restore state on disk when moved, or taken offline. All failover activity must result in a cold boot when the virtual machine is activated on the target node. All planned migration must either result in shutdown and cold boot, or an online migration that makes use of a technology like Hyper-V Live Migration. Hypervisor migration of virtual machines is supported by the hypervisor vendor; therefore, you must ensure that your hypervisor vendor has tested and supports migration of Exchange virtual machines. Microsoft supports Hyper-V Live Migration of these virtual machines.

In summary, Exchange 2010 SP1 or better supports hypervisor migrations such as Hyper-V Live Migration and VMware ESX vMotion for DAG member servers.  Host-based failover cluster migrations, such as Hyper-V Quick Migration, is supported only if the virtual Exchange DAG server is restarted immediately after the quick migration completes.  Exchange 2010 RTM is not supported with either migration technology.  RTM only supports the native Exchange high availability features present in DAGs.

Other Exchange Server 2010 roles (CAS, Hub Transport, Edge Transport, and Unified Messaging) fully support host-based failover clustering and migration because they do not employ native Exchange high-availability solutions.

For a list of the virtualization platforms supported by Exchange, visit the Windows Server Virtualization Validation Program website.

Fixing Time Errors on VMware vSphere and ESX Hosts

Time synchronization across a Windows domain is very important.  If a member server’s clock varies more than 5 minutes from other domain servers, Kerberos tickets will fail.  This causes random authentication errors for users and/or applications which are sometimes difficult to troubleshoot.

Normally, time is synchronized in a Windows domain using the domain hierarchy.  The domain controller holding the PDC Emulator FSMO role is normally configured to get time from an authoritative NTP time source, and syncs time with all the other DCs in the domain.  The domain clients in each site sync time from the DCs in their local site, maintaining a relatively close synchronization of time across the domain.

Virtual machines are no different than physical computers and normally sync time using the same domain hierarchy.  Lately, however, I’ve seen VMs running on VMware vSphere boot up with random time differences from the domain.  I’ve seen this problem with three different clients lately, so I figured this might be a pervasive enough issue to blog about.

The trouble happens when the VMware vSphere, ESX or ESXi host does not have an accurate source of time, or time “drifts” due to an inaccurate system clock module.  vSphere and ESX hosts run a proprietary operating system and are not domain member servers, therefore they do not participate in domain hierarchy time synchronization. 

Most companies that use VMware hosts use vCenter to manage these hosts and their VMs.  Often, the servers that run vCenter are domain member computers and administrators think that since the vCenter syncs time with the domain, the hosts and VMs do, too.  Not true.  You need to configure the vSphere or ESX hosts to sync time from an accurate time source, otherwise the VM guests may start up with the wrong time – this can happen even if time synchronization between the virtual machine and the ESX server in VMware Tools is not enabled.

Here’s how to configure your vSphere or ESX hosts to get time from an authoritative source.

  • Logon to vCenter and select your vSphere or ESX host.
  • Click the Configuration tab and then Time Configuration under the Software heading.  Notice that the time on the vSphere host does not match the domain time shown on the Windows client running vCenter .

  • Click Properties in the top left of the Configuration tab.  This opens the Time Configuration window.

  • Click the Options button and add a new NTP server that is the accurate source of time.  I recommend using the PDC emulator, since it should already be configured as an authoritative time source. 

  •  Select the checkbox to Restart NTP service to apply changes and click OK twice to close the Time Configuration window.  You will see that the vSphere/ESX host now has the correct time and is configured to use as its time server.

You may need to restart the VM guests running on that VMware host to have them sync time with the domain.  The Windows Time service will not correct the time on the VMs if it varies too much from domain time.  All domain computers sync time when they start up on the domain, regardless of how far out of sync they were.

I have not seen this type of behavior with Hyper-V, only vSphere, ESX and ESXi hosts.