This article, Surviving the Windows Server 2003 Cluster Bomb, Part II, is a continuation of a previous article. Part I was terrible. Part II is just as bad.
When the first part of the article came out, Rodney R. Fournier wrote about the basics of what the author did wrong. I personally thought that he let the author off easy (notice I am not using his name as I don’t want to influence his google-ability) when he should have hammered him.
What it comes down to is that the cluster quorum (which he constantly calls the cluster database) had become corrupted. Well, this incredibly easy to fix, but rather than follow rational steps and fix the quorum (or calling Microsoft PSS for assistance), the author suffered severe panic and went on the path of destroying the cluster completely, rebuilding the cluster, reinstalling the applications and restoring the data. What should have been less than 30 minutes became a weekend because of the author’s ineptness.
Here is what you should do if you have a similar problem.
- Start up the first node of the cluster
- If the cluster service fails to start, then go to a command prompt and start it by typing, “net start clussvc /resetquorumlog” which will clean up the corrupt quorum log and build a new one using data stored on the node.
- If you need to replace the quorum disk, start the cluster service using this command, “net start clussvc /fixquorum” which will start the cluster service but leave all resources offline. You can then move the quorum to another disk or you can replace the disk and use clusterrecovery.exe (or dumpcfg.exe from Windows 2000) to fix the disk signature.
- Stop the cluster service using the command, “net stop clussvc” at the command prompt and then restart the cluster service without any switches.
Yes, it really is that easy.