Network Load Balancing and MAC Addresses

I learned something new yesterday. It kind of flipped me out, but now it almost makes sense.

 

You can try this to confirm.

  1. From a client, ping the IP address of your NLB cluster.
  2. From the same client, run arp -a fom the command prompt.

You should see something like this (I will assume 192.168.2.11 for the NLB cluster IP address):

    Internet Address         Physical Address      Type

    192.168.2.11            02-bf-c0-a8-02-0b     Dynamic

 

It will list other addresses and their MACs as well, but we are only interested in the NLB address. 02-bf-c0-a8-02-0b breaks down into nice little components like so:

  • The first number is the type of NLB configuration: 01=IGMP, 02=Unicast, 03=Multicast
  • The second number, (bf), is unknown in its origin, but it is the same for all NLB configurations
  • The next four numbers are the IP address, i.e. c0=192, a8=168, 02=2, 0b=11 and thus the IP of 192.168.2.11.

OK, I already knew all of this. It is the following that was new to me.

 

It is the second set of numbers, bf, that is interesting to me. I can’t find anything that tells me why bf is used, but it is always used when arp requests the MAC from the NLB IP address. Why I find it interesting is that it is not used at all when the NLB nodes send GARPs or when they return traffic. What each NLB node does, when sending traffic, is it spoofs the MAC as above except it replaces BF with the priority number. For example, if the NLB cluster node were configured with the number three as its priority (unique) number, then it would identify itself to the switch as being MAC address 02-03-c0-a8-02-0b. This allows the switch to happily enter the MAC Address in its table and have a one to one mapping of MAC Addresses to ports.

 

So, when an NLB client tries to connect to the IP address of the NLB cluster and does an ARP on the IP to identify the MAC Address, the switch fabric flips out because it can’t find any ports that contain that MAC address and thus flood the fabric. The use of the priority number stops the switch fabric from trying to learn the actual MAC address of the NLB cluster and provides a bit of sanity/reality for the switch so that it is happy. 

 

So, to summarize, each client connecting to the NLB cluster will use the bf MAC address as the destination which causes the switches to flood all ports with the traffic. Each NLB node sends data using the priority number instead of bf to stop the switch from learning the bf MAC address and trying to map it to a single port. 

 

Of course, all of this leads us to the question about switch flooding and how to limit it. For this information see my blog entry on Unicast vs. Multicast.     

 

Surviving the Windows Server 2003 Cluster Bomb, Part II

This article, Surviving the Windows Server 2003 Cluster Bomb, Part II, is a continuation of a previous article. Part I was terrible. Part II is just as bad.


When the first part of the article came out, Rodney R. Fournier wrote about the basics of what the author did wrong. I personally thought that he let the author off easy (notice I am not using his name as I don’t want to influence his google-ability) when he should have hammered him.


What it comes down to is that the cluster quorum (which he constantly calls the cluster database) had become corrupted. Well, this incredibly easy to fix, but rather than follow rational steps and fix the quorum (or calling Microsoft PSS for assistance), the author suffered severe panic and went on the path of destroying the cluster completely, rebuilding the cluster, reinstalling the applications and restoring the data. What should have been less than 30 minutes became a weekend because of the author’s ineptness.


Here is what you should do if you have a similar problem.


  1. Start up the first node of the cluster
  2. If the cluster service fails to start, then go to a command prompt and start it by typing, “net start clussvc /resetquorumlog” which will clean up the corrupt quorum log and build a new one using data stored on the node.
  3. If you need to replace the quorum disk, start the cluster service using this command, “net start clussvc /fixquorum” which will start the cluster service but leave all resources offline. You can then move the quorum to another disk or you can replace the disk and use clusterrecovery.exe (or dumpcfg.exe from Windows 2000) to fix the disk signature.
  4. Stop the cluster service using the command, “net stop clussvc” at the command prompt and then restart the cluster service without any switches.

Yes, it really is that easy.

Network Load Balancing (NLB) and Network Interface Card (NIC) Teaming

The quick summary of this post is, “Don’t use NLB on teamed NICs.”


Microsoft clearly says that NIC teaming “may” cause problems with NLB in KB 278431.


This is where things get confusing, because the issue is just that; it may be a problem. The reasoning is really fairly simple. Teaming software, in many cases, overwrites the MAC address of the individual NICs in the team. Well, NLB, in Unicast, also overwrites the MAC address. So, the problem is:


  • Will the teaming software allow the overwrite behavior of Unicast?
  • Will the teaming software handle the failure of a NIC in the team and the overwrite process of NLB in the event of a NIC failure?

The answers are, it might not allow the overwrite in Unicast, and it might not behave properly in the event of a NIC failure and passing of the MAC to the other NIC in the team. Thus, the “may” statement earlier.


The way it needs to work is that teaming software for NICs nees to support the overwrite of MAC addresses. Many vendors do now provide this support. A workaround exists allowing the team MAC address to be set directly through the management tool. Compaq/HP, for example, defaults to the MAC address of the primary adapter. After NLB sets the MAC on the virtual adapter (the NIC team), the Compaq/HP software does not propagate the MAC address to the physical adapters. To make it work, you have to copy the NLB MAC and paste it into the team MAC in the management software. Workarounds and High Availability environments can not be used in the same sentence, thus, this is not a best practice.


My contention is simple: Since we can’t guarantee transparncy of failure of the team and how it allows NLB overwrites of the MAC (this is a hardware driver issue that Microsoft can not guarantee will behave properly), it should be considered a best practice to not use teaming for NLB NICs.


By the way, this behavior does not change in Longhorn.