SQL 2008

One of the great new features of Windows 2008 Failover Clustering is the removal of the “same subnet” restriction. This is a huge feature for multi-site clusters as I personally know of several customers that did not implement a multi-site cluster specifically because of this limitation. Most of the folks that I spoke with at TechEd are excited about this new feature and are considering multi-site clusters specifically because of this change.


Anyway, at TechEd this year the SQL folks made it publicly known that SQL 2008 will NOT support clusters with nodes running on different subnets. Ouch! This is a mighty blow to anyone considering a stretched 2008 cluster as SQL is certainly one of the top cluster applications. 


Personally, I’m dumbfounded by this and I don’t understand how Microsoft allows a product to ship that doesn’t support this feature in clusters. This issue was revealed to us at the MVP conference in March and my response was to obnoxiously shout at the SQL team, “Boooooooo, hissssssss!”


 Hopefully, this will change quickly as more customers demand support for this feature in SQL.

EMC MirrorView Cluster Enabler Released!

If you’ve read through some of my previous posts, you might have picked up on the fact that this product was coming soon. EMC has finally released an SRDF/CE equivalent for the Clariion arrays utilizing MirrorView. Sure, we’ve got other products for other clusters, but this is the first that fully integrates with Microsoft clustering and MirrorView.


This first release (named 3.0 to fool you [:)]) has some limitations as this is our first crack at Clariion support. Here are some of the requirements/limitations:




  • Support for Windows 2003 and 2008 clusters only. No 2000 support (thankfully)


  • Requires Clariion arrays to run Flare 26 or higher. Some arrays have a max Flare level of 19…RPQ might be needed for these arrays


  • Initial release supports MirrorView/S only. Future releases will look at adding MirrorView/A support, but initially it will be synchronous only


  • Only supports non-disk based quorum models. So your cluster has to be MNS (Node Majority) or MNS w/FSW (Node and File Share Majority). Clariion arrays do not have the same locking capabilities to perform their own geographical arbitration, so disk quorums are not supported

The software is currently available for download on EMC’s Powerlink web site. The software is listed under “Cluster Enabler for Microsoft Failover Clusters” and the product guide and release notes are currently under “MirrorView” (though this may change).


Also, if you’re attending EMC World this week, make sure you see the Cluster Enabler sessions and chat with the developers about these new releases.

Cluster Enabler 3.0 Released

EMC recently released the latest version of SRDF/CE for MSCS. This new release has the following changes: 




  • New Name – The overall product line has been renamed to “Cluster Enabler” and the first product released in this line is SRDF/CE for MSCS. In the near future, MV/CE will be added to this family of products.


  • Redesigned GUI – SRDF/CE GUI has been completely re-written for this release. The GUI in 3.0 has been simplified and wizards have been streamline, making it easier to configure. The SRDF/CE GUI is more tightly integrated with Cluster Administrator and functions like changing Quorum model in SRDF/CE will also make these changes in MSCS (previous version forced you to manually make this change).CE 3.0 GUI
    CE 3.0 GUI


  • Multiple CE Cluster Management – The CE GUI lets you manage remote clusters and multiple clusters, similar to Cluster Administrator


  • Windows 2008 support added – Version 3.0 adds support for Windows 2008 clusters.



    • All quorum models supported


    • Multiple subnets supported


  • Windows 2000 support dropped – Good riddance


  • Support for 5773 code added – Minimum microcode for this release is 5×70.


  • Support for multiple Symms per cluster – You can now have multiple Symmetrix pairs in a single SRDF/CE cluster. Concurrent SRDF is now tolerated per the product guide.


  • Site mode changes – The default behavior is now called “Restrict Group Movement” and this is basically a combo of the old “No New Onlines” and “Local Override” settings from previous releases. Basically, groups are allowed to come online where a disk is RW while the RDF link is down, but if the disk is WD, the CE resource will fail to come online. The other option is “Automatic Failover” and this feature is essentially the same as the old “SRDF Override” setting. The “Failstop” option is no longer listed, but it is configurable via CLI. The “Forced Failover” and “Local Override” values have been removed.


  • Site Mode now set at the GROUP level – Previously, this setting was a cluster wide setting. Now, this value can be adjusted on the individual group level. This gives you greater flexibility on which groups you might consider putting at risk by enabling automatic failover during a site outage.


  • SRDF/CE resource/registry changes – Most settings for SRDF/CE are no longer stored in the SRDF/CE\Config hive in the registry. Instead, these settings are now stored as private properties of the SRDF/CE resources in the cluster.


  • Installation changes – CE installation now requires that MSCS be installed first before attempting to configure the cluster for CE…this is similar to the old “Convert MSCS to SRDF/CE” wizard. MSCS must be installed on at least one of the cluster nodes prior to configuring CE.

For more information, you can get a copy of the software, product guide and release notes on EMC’s Powerlink website.

Clustering Webcasts on Demand

Microsoft gave some great webcasts related to 2008 clustering last month. If you missed any of these sessions, you can view these recording on-demand and/or snag the PPT files here:


Building High Availability Infrastructures with Windows Server 2008 Failover Clustering
Webcast – http://go.microsoft.com/?linkid=8131531
PPT – http://go.microsoft.com/?linkid=7933793


Failover Clustering 101
Webcast – http://go.microsoft.com/?linkid=8146346
PPT – http://go.microsoft.com/?linkid=7933794


Failover Cluster Validation and Troubleshooting with Windows Server 2008
Webcast – http://go.microsoft.com/?linkid=8173778
PPT – http://go.microsoft.com/?linkid=7933795 


Geographically Dispersed Failover Clustering in Windows Server 2008 Enterprise
Webcast – http://go.microsoft.com/?linkid=8187007
PPT – http://go.microsoft.com/?linkid=7933796  


Deep Dive on Failover Clustering in Windows Server 2008 Enterprise Storage and Understanding Quorum
Webcast – http://go.microsoft.com/?linkid=8213972 
PPT – http://go.microsoft.com/?linkid=7933798


 


FYI, in case you are not aware, the Microsoft Cluster team has re-started their blog. You should visit http://blogs.msdn.com/clustering so you don’t miss any additional updates from the cluster team.

Future Releases of SRDF/CE

It’s been far too long since I’ve written a blog post, mostly due to the holiday rush. There’s much going on in my world of geo-clustering with 2008 RC1 being released, and also there’s been some development activity for SRDF/CE. I’ve been asked questions many times in various forms about the future outlook of geo-clusters and specificly, the longevity of a product like SRDF/CE. Some ask if EMC has any plans to support 2008 cluster and other ask if the product is dying due to new application features such as Exchange CCR or SQL mirroring. I’ll address this second part first as it’s easier to answer and I’m not sure how much I am allowed to disclose about future EMC product releases.


 


Some have suggested that Exchange 2007 CCR eliminates the need for clusters and SRDF/CE, so EMC’s product must be of the dying breed. In reality, I see an exact opposite effect occurring. With Microsoft promoting the idea of disaster recovery and multi-site clusters in their products, I see more customers looking at their current infrastructure and realizing that they have a need to implement some form of multi-site clustering solution. As they look at the major applications like Exchange and SQL, many customers will then realize that they have many other business critical applications/services in their existing clusters that they would also like to protect from a total site outage. Believe it or not, some customers use applications other than SQL and Exchange in a MSFT cluster…I see lots of Oracle, file & print and other custom apps in clusters. This triggers customers to look for a more robust and total multi-site clustering solution. This is why I feel that a product like SRDF/CE will be around for a long time to come.

 

As for future releases of SRDF/CE, more versions are planned and being developed. The next major release of SRDF/CE (version 3.0) will have support for 2008 server…I think this is pretty safe to say and I’m not really giving away any big company secrets. 2008 cluster is going to be HUGE for multi-site clustering due to the new AND/OR dependencies which remove the “same subnet” requirement that is in place for clusters today. I personally know of several customers that have dropped their goal of implementing a geographically dispersed Microsoft cluster due to this specific single subnet limitation. If you haven’t tested 2008 cluster, I would recommend grabbing a copy of 2008 RC1 from the Microsoft website and start poking around with this version to get used to some of the differences.

 

There are several MAJOR changes coming in this next release of SRDF/CE and one will be support for 2008 clusters. Another change will be another slight name change for the product. Version 3.0 will be dropping the SRDF part of the product name and will be named simply “EMC Cluster Enabler” or CE…rolls of the tongue a little nicer than SRDF/CE. The main reason for the name change is the addition of a little feature that I had mentioned in a previous post. I’ll write up some more about CE 3.0 in the near future.

What applications are supported by SRDF/CE?

I often get asked if SRDF/CE supports various versions of Exchange or SQL, especially just after a new version is released. If you are one those that have asked, you have likely received my sarcastic, yet completely accurate answer that SRDF/CE does NOT support ANY applications other than MSCS. Hence the name, SRDF/CE FOR MSCS. There is only one application that SRDF/CE supports in a Windows environment and that is Microsoft Cluster Service.


If this seems too harsh, I can also accurately claim that SRDF/CE supports ALL applications that are supported in a Microsoft cluster. So if a new version of XYZ application comes out or if you want to know if a specific application is supported in an SRDF/CE cluster, my alternate answer is that SRDF/CE will instantly support this application/version with no qualification needed. I guess this sounds better to folks as there usually isn’t a follow up question after this answer.


I guess my point would be that the SRDF/CE product works to enable cluster disks so that they can be used over a distance with SRDF. SRDF/CE only cares about the disks and its own cluster resources. Any application that you wish to install in the cluster is up between MSFT and the application vendors to support. Anything that you can do in a normal cluster should be doable in an SRDF/CE cluster. The only real difference as far as cluster applications are concerned is that the failover time between nodes will be delayed while the SRDF magic occurs under the covers.

Quorum Arbitration in a Geographically Dispersed Cluster

Thanks to Oliver Simpson for his comments and his question…how does quorum arbitration work in a geographically dispersed cluster and how is this different between SRDF/CE and MirrorView. This is a great question and I hope I don’t bore you with too many details here.


There are many storage vendors out there and I’m sure that they all have their own method of performing box to box replication and controlling access to these mirrors. With EMC SRDF, the source device is labeled an R1 device and the remote device is considered an R2 device. SRDF pairs can be manipulated by users with the Solutions Enabler software to make either (or both) mirrors read/write enabled or write disabled. With MirrorView, it labels one mirror the Primary mirror and the remote is considered the secondary mirror. MirrorView volumes can be manipulated using the Navicli software to promote or fracture mirrors controlling the read/write access to the volumes. So far, there we’re pretty much equal with their capabilities.


In a geographically dispersed cluster, we’re going to have one node of the cluster connecting to the R1 or primary mirror, while the other node attempts to access the R2 or secondary mirror. Typically, the R2/secondary mirror is either write disabled or not ready to the host while the cluster owns the resources on the R1/primary mirror. When the cluster needs to fail over to the remote site, you will need to issue commands to make the R2/secondary mirror read/write enabled before MSCS attempts to bring the disk resource online, and also make the R1 mirror write disabled/not ready.


For all other Physical Disk resources in the cluster (other than the quorum), we can use cluster resource dependencies so that the disks do not attempt to come online until the storage has been read/write enabled to the host. With SRDF/CE, we’ve created our own resource type to control these actions for all non-quorum resources and as I’ve described in a previous blog entry, you could create a generic application or script resource to control this behavior with MirrorView using Navicli commands.


The big problem comes in when we need to deal with a shared quorum disk in Microsoft cluster. The quorum disk CANNOT be made to depend on any other resources. Microsoft has done this so customers cannot shoot themselves in the foot by accidentally making their quorum disk depend on a faulty or mis-configured resource. When you attempt to make the quorum depend on a resource, you will receive an error message:


Dependency on Quorum


Additionally, if you do find a way to overcome this issue, you will next need to contend with the issue of the dreaded “split brain” syndrome. What happens if you have a network failure between sites and MSCS attempts to arbitrate for the quorum disk in a geographically dispersed cluster? You are accessing the quorum disk over two separate SCSI busses using two separate physical disk drives, so when MSCS issues a SCSI bus/target reset against the local quorum disk, this has no affect on the remote device so we’ve basically broken the quorum arbitration process. If some sort of mechanism is not in place, MSCS would be able to bring both copies online and this will likely cause a split brain to occur.


Because of these challenges, it is difficult to find a work around and have a shared quorum disk in a geographically dispersed cluster. Microsoft has given customers a way around this by introducing the MNS quorum, which is certainly a doable option…though it does have its limitations. See my previous entry for more information on this topic.


With SRDF/CE, we accomplish this by adding a secondary arbitration process whenever a standard Microsoft quorum arbitration event occurs. SRDF/CE adds a filter driver into the stack to detect quorum arbitration events. When an arbitration event occurs, the SRDF/CE filter driver halts the standard quorum arbitration process and completes its own arbitration before allowing the standard quorum arbitration event to continue.


The SRDF/CE arbitration process uses Solutions Enabler APIs to acquire Symmetrix locks to simulate a persistent reservation across the SRDF link. We release/acquire these locks during quorum arbitration in a fashion similar to Microsoft’s challenge/defense arbitration process. When a bus reset occurs from the challenger, we release locks and wait to see if the defender will re-acquire its locks and then fail if the node successfully defends. Once a node has successfully acquired these Symmetrix locks, we issue the appropriate Solutions Enabler commands to make the quorum disk resource read/write enabled to the host. Once the quorum is read/write enabled, the cluster quorum arbitration process continues and MSCS decides whether or not to allow the quorum to come online on the node.


In the Clariion environment, the CX arrays do not have these locking mechanisms and the API is not as robust so we do not have the same capabilities. Therefore, using a MNS quorum is the only available option in the MirrorView environment.


I hope this helps to give some insight about the way that SRDF/CE handles quorum arbitration. Feel free to leave a comment if anything is unclear.

SRDF/CE Training

In a past life, I used to travel the globe delivering training classes to EMC personnel and a few select customers. This training was not widely known about and was only given to customers that begged for more knowledge about the product.

 

I am happy to announce that EMC is now offering a training class for SRDF/CE that is open to customers. Originally, this training was only available to employees, but we have since seen the error in our ways so this is now open to the public. The class is named “SRDF/CE for MSCS Implementation and Management” and details are available on https://learning.emc.com/Saba/Web/Main/goto/64726739 (PowerLink account required for access). Classes are running about once or twice a quarter in the US and Europe. Here’s a basic overview of the objectives covered in this class:

 

  • Describe the benefits of SRDF Cluster Enabler for MSCS

  • Implement the SRDF/CE product into a typical geographical clustered environment

  • Transform an existing Microsoft cluster into a SRDF/CE configuration

  • Restore application availability after a disaster occurs within a SRDF/CE for MSCS environment

  • Perform ongoing administrative and operational tasks for SRDF/CE clusters

  • Perform SRDF/CE for MSCS troubleshooting
 

It’s a 5-day instructor led class with many hands-on labs. I helped to develop this class and I think we’ve created a pretty solid training experience. The class is based off my original training, though it has been updated for the more current releases of SRDF/CE. I would highly recommend attending this training if you’ve got SRDF/CE in your environment.


 

MirrorView/CE

Continuing my topic about geoclustering on a Clariion, we’ve recently discovered that my previous scenario that I tested can actually be FULLY SUPPORTED by Microsoft and EMC! Thanks to Edwin for pointing out the following entry in the Windows Catalog:


http://www.windowsservercatalog.com/item.aspx?idItem=4f28600a-1184-77a3-5768-7091f50227dd


You might notice that this entry lists a product called MirrorView/CE v1.0 with MNS (sounds familiar). I’ve found that this is not an actual EMC product, but instead is a set of scripts that control the MirrorView LUNs. EMC requires that customers submit an RPQ for support for this type of configuration, but they certainly will support it. The local TSG resources can be commissioned to write a custom script that performs the necessary activities to facilitate failover. This is currently only supported using MirrorView/S, though I’m sure that it could also be supported with MV/A.


This is pretty cool stuff that’s been a long time coming.

Majority Node Set Clustering

*Update: This article is specific to Windows 2003 MNS. Windows 2008 MNS behavior is slightly different so my comments below may not apply to 2008 clusters*

When I was first introduced MNS, I hated this little feature of Windows 2003. In my opinion, MNS has created some confusion in the marketplace as it has been positioned (incorrectly) by some as “the” solution for geographically dispersed clustering. I’ve seen many posts over the years in the newsgroups from folks that have setup their MNS clusters and now want to know how to make their cluster work without shared storage. News flash: MNS does not mean that you not need shared storage for your cluster. MNS only means that the quorum disk no longer requires a physical drive resource for the quorum and the same rules still apply for the rest of your clustered resources. Want to have a print spool or DTC resource? Sorry, you still will need a physical disk resource in your cluster. If you’ve got an application that does not require shared data, then maybe MNS is the solution for you but most cluster applications will have a shared disk requirement.

It is my belief that no one in their right might would use MNS in a HA environment that is geographically dispersed…unless you plan to span 3 sites. Why would I say this? Well, if the goal of HA in your environment is to maintain uptime, why would you introduce a “feature” that will guarantee a total cluster outage if half of the cluster is suddenly unavailable? When you look at geographically dispersed clustering, you’re typically looking for a solution that can help you survive a total site outage…why else would you spend the time/money on a geo-cluster? With MNS, chances are high that your whole cluster is going down in all site disaster scenarios. Let’s take a look at some of these scenarios with MNS:

Scenario 1 – Primary site has 2 nodes and DR site has 2 nodes. If you lose either site, you will lose the entire cluster since no site can ever have a majority in this situation. This will only happen to you once and the lesson learned here is that we never want to have an even number of nodes with MNS.

Scenario 2 – Primary site has 3 nodes and DR site has 2 nodes. Your cluster can now survive the outage of the DR site, but cluster will not survive an outage of the primary site…which sort of defeats the whole purpose of having a DR site in the first place.

Scenario 3 – Primary site has 2 nodes and DR site has 3 nodes. Your cluster can now survive the outage of the Primary site, but now the cluster will not survive an outage of the DR site…which again seems to defeat the whole purpose of having a DR site when your DR site causes an outage of your production cluster applications.

Some will argue that in each of these scenarios, you can MANUALLY get your cluster up and running if you use the FORCEQUORUM procedure…which I do not deny. At least you do have some capability to get a somewhat working cluster up in these DR scenarios, even if it is a manual solution. There’s another HUGE gotcha here that is not often talked about or documented well. After you’ve started your cluster using /forcequorum, when the other nodes come back online, these nodes CANNOT join back into the cluster. In order to get your cluster back up and running again, you need to TAKE DOWN THE ENTIRE CLUSTER and then start all of the cluster nodes normally with no flags. Of course you can plan this downtime but no one ever wants to hear that their whole cluster has to go offline. This seems to defeat the entire purpose of HA when you are requiring a cluster shutdown to recover from your recovery procedure.

Based on the above scenarios, you might start to see why I would make the claim that no sane cluster admin would use MNS in their environment. If you substitute a shared disk quorum in any of these scenarios, the cluster would survive the outage as long as any one node survives. Also, a total cluster outage is not required to get the other nodes back into the cluster.

I think I’ve done enough bashing of MNS so let’s start to look at some of the good points. One of the major distance limitations of geographically dispersed clustering is Microsoft’s requirement that the quorum disk is only supported using synchronous replication for the quorum disk. From KB 280743, “The quorum disk must be replicated in real-time, synchronous mode across all sites.” This limits the possible distance for your geo-cluster solution based on the replication technology you are using…with EMC’s SRDF, this limit is approximately 200km for SRDF/S. Well if you use a MNS quorum instead of a disk quorum, you are no longer limited by the requirement of synchronous replication technology. With MNS, your only limit is the network latency requirement, and even this has some flexibility now with the introduction of hotfix 921181. So this is one key reason why you might consider using MNS over a shared disk quorum resource. If you are looking for an extended distance geo-cluster solution, MNS is the only way to go.

Another new feature introduced in hotfix 921181 is the ability to use a File Share Witness (FSW) node with 2-node MNS clusters. This allows you to have a file share anywhere in the network (doesn’t need to be on the same subnet!) and this node will be used as the decision maker when one of the nodes fails. You could even setup this FSW as a clustered file share resource in a separate cluster giving another level of protection to this decision maker. The downside to FSW is that it currently only works in 2-node clusters. In Longhorn, this will change but today the feature only works in 2-node clusters. If you add a third node to your cluster, the cluster will ignore the FSW settings. Another minor downside is that the FSW share does not contain a full copy of the CLUSDB so you could not restore a cluster registry hive using the data from the FSW. The FSW is only used to help make the decisions during quorum arbitration and does not contain the clusdb info. 

Another place where MNS might be the better option would be the geographically dispersed cluster that spans three sites. A three site synchronous disk replication of the quorum disk will prove to be a challenge for any vendor. Based on the quorum disk’s synchronous requirement, you’re also going to need 3 sites that are all within 200km of each other which may also prove to be a challenge. With three sites, I typically would picture having two Primary sites replicating data to a single DR site. Each of the primary sites would have replication running between themselves and the DR site, but no replication between the primary sites. This sort of configuration would only be possible with a MNS quorum. In this three site scenario, you could survive the failure of any single site and keep the other two up and running.

So overall, I don’t hate MNS nearly as much as I used to. I can see that it has its place, and can see some benefits in specific scenarios.