3 or More Nodes – Failover Process

An item of confusion with clustering often pops up in class. The question is, how does the cluster service decide what node to failover to in the event of a cluster group failure? The answer is not so straight forward. First, let’s look at the different contributors to the decision.


  • Order of Installation – Simply, in what order were the nodes installed? Was NodeA installed before NodeB and NodeB installed before NodeC and NodeC before NodeD?   
  • Possible Owners –  This attribute is a property of an individual resource. In this situation, the possible owners strictly is used to define which nodes can run the resource, and this will also then control the cluster group it belongs to in that the resource can only run in a particular cluster group. For example, if the resource is part of a cluster group running on NodeA, and it fails, it will not failover to NodeB if it is not listed as a possible owner of the resource.
  • Preferred Owner – This attribute is a property of the cluster group. It is used to define a priority list for where a cluster group should run. In the event of failover, it will failover to the Node highest on the preferred owners list. 
  • AntiAffinityClassNames –  This is a new property for a cluster group in Windows Server 2003. Basically, what is done is that each node can be configured with a list of “names” or terms. For example, a cluster group on NodeA can be configured with “SQL” and a cluster group on NodeB can be configured with “SQL” as well. In the event that NodeA fails, the cluster group configured with the AntiAffinityClassName of “SQL” generally will not be failed over to NodeB because it also holds that name for one of its groups. This one gets a little more confusing that I like.

Now that we understand the attributes and properties, let’s look at the failover process:


  1. A cluster group on NodeA achieves the threshold for failover and fails. The cluster group failures depends on resource failures. For example, if a single resource fails more than 3 times in 900 seconds (default settings), then it will cause the entire cluster group to failover if the Affect cluster check box is enabled. If multiple failures among multiple resources in a single group exceed 10 in 6 hours (default settings), that will also cause the cluster grop to failover (again, assuming the Affect cluster check box is enabled).
  2. Nodes are checked to see if they are available. Available means that they are online and running and that they don’t have any restrictions against running, like not being included on the possible owners list. If nodes are not able to run the application, then they are not considered available.
  3. The cluster group looks to its preferred owners list and selects the available node highest on the list. If a node is on the preferred owners list and is also hosting another cluster group that has been anti-affined (I love that word), then it will use another node on the preferred owners list if available. If no other nodes on the preferred owners list are available, then it will ignore that AntiAffinityClassNames property. If no nodes are listed as preferred owneers, then this step is skipped.
  4. The cluster group will failover to an available node based on installation order. So, after NodeA is NodeB. The cluster group will attempt to failover to NodeB. If NodeB is not available, then it will failover to NodeC (3rd on the list in the installation order).

In the event there are no available nodes, then the cluster group will just fail and remain offline in a failed state until manual intervention takes place.


 


Note: AntiAffinityClassNames will not prevent two cluster groups that have been anti-affined from being on the same node. The property will help to limit that event from happening but won’t stop it if that is really the only option other than letting the cluster group fail. If you have an application that can’t handle more than one instance per node, then you need to create a resource DLL to stop them from being on the same node.

New Job – Same as an Old Job

This is the first week at my new job. What is it? Well, it is the same as a job I had two years ago.  I am back working at Ameriteach in the Denver area in Colorado. I am extremely glad to be back there. It is a great job working for great people and with more great people.


I finished up my work at Infocrossing on August 4th, took some time off to work on some other projects, did some contract work, and then it was time to start the new job.


It has been a little over two years since I started working for Infocrossing, and it was a difficult decision to make. When I started with the company, our name was (i)Structure and we were owned by Level(3). It was an interesting situation because we were making good money, but Level(3) was sucking wind.


Level(3), in their infinite wisdom, sold us to Infocrossing. It was a good match, for Infocrossing. We both specialized in outsourcing of infrastructure. We kept bumping heads in the market place. In fact, some of us feel that Infocrossing bought us because we were always kicking their asses when it came to acquiring new business.


Anyways, after the buyout, I told myself I would give it 6 months to a year and evaluate whether I would move on or not. I really hadn’t gotten to the point where I was not happy, so I never started that evaluation process.


About 8 months had passed since the buy out, and I got a call from Ameriteach. I used to work there as a trainer and loved it. Well, they asked me to consider coming back to work for them. I jumped at it. OK, to be truthful, I took a few days. I am so loving being back in the classroom full-time and ramping up on the latests and greatest Microsoft technologies once again.


I am really going to miss everyone at Infocrossing. They have some great people and some fantastic managers. To them, I will always wish for the best.


As to my new bosses, I truely think the world of them and I am so glad they wanted me back again. It is a great feeling!

Discounts for MCTs and MVPs

For the ClusterHelp.com class in NYC in December of 2006, we are offering 50% discounts for fellow MVPs and fellow MCTs.

So, instead of paying $2,400 for the four day course, those that qualify can get in for $1,200 for the four days. Of course, most people can do the math.

For more information, and to register, please contact Patrick Power via email at ppower@netlan.com or you can use an old fashioned device like a telephone and call him at 212-730-5900. You can also get more information about the course from the www.clusterhelp.com web site. Click on the Training Outline and Our Advantage buttons to see why you should attend this course, and click on About Us to learn about the trainers.

I promise that anyone that attend the NYC course will learn a great deal and will also have a good time. Really, it is a promise. <G>

NYC December Class Almost Sold Out

This is a nice problem to have. It is still a couple of months away and it is already sold out.

What to do? What to do?

We talked NetLAN into providing two classrooms, so we can have more students attend. Come on and enjoy the fun of NYC in early December. Join us for class, then join us for drinks and possibly a show or two if you are up to it.

I promise a week of great information, and evening fun as well. Let's fill up another classroom.