Microsoft Cluster Summit

Yep, this is a commercial…

Rodney R. Fournier and I will be in NYC enjoying the life of teaching others all about Microsoft clustering. We will cover NLB and server clustering with labs on installing SQL and Exchange server clusters.

So, if you have 4 days to spare and can get to NYC, you should sign up. See for more information. Check out the Schedule link to see how to sign up.

Please return to your regularly scheduled newsfeeds.

Update: Four days of cluster training can lead to bleeding from the ears. Several students left looking like they had been stunned by percussion grenades when the class was over, but all of them said it was the most useful class they have ever taken. It looks like Rod and I are on the right track.

Exchange Q&A with David Elfassy

David Elfassy IM’d me yesterday and asked me to play a game of word association. I hate word associations. It seems that word associations always lead to people thinking I am insane, and I get locked in padded rooms.


Anyways, he said it would really help him with some Exchange materials that he is working on, so I agreed.


David: SAN optimization. (First word that comes to your mind)

Russ: Banana (I was hungry and was reaching for a banana)


David: You *&(%$__*(&*(@#** – (wow, that was rude)  🙂

Russ: Cheerios (what else do you say to that?)


Note: at this point, I could already hear the ambulance and the guys in white suits coming to take me away.


David: SAN optimization.

Russ: diskpar, multipathing (see the link for Rod Fournier’s diskpar blog, and see  on the Exchange team blog on the same subject here)


David: SAN availability

Russ: Multipathing, SAN replication, geo clustering


David: Transaction log performance on SAN’s

Russ: disable write back cache for the LUN if possible, provide a separate LUN, RAID 1 within the SAN

Note: David asked me to explain why I did not say anything about RAID 10 or 0+1. I told him that I can’t see throwing that much at a transaction log unless you really have huge amounts of transactions that require scaling up the size of the log space to hold a large size.

David: iSCSI for Exchange

Russ: Interesting solution, not near the I/O achievable via a SAN with multipathing, though. Must use a dedicated gig/e network to achieve best performance.


David: WIndows Storage Server with NAS for small orgs

Russ: Good solution depending on size of org. Again, not near the I/O of an inexpensive SAN device and should again have a dedicated gig/e network for the connection.

Unicast vs. Multicast – Original Posted Feb 21, 2005

As usual, confusion motivates me to blog some more. In this case, I have blogged this because I was confused, and I am pretty sure that I have it straight now. Comments may prove me wrong.

When designing, planning, testing, and implementing Network Load Balancing (NLB) Clustering, a choice has to be made regarding unicast vs. multicast. There are a few differences, but the main difference is in the way MAC addresses are implemented.

Unicast – Each NLB cluster node replaces its real (hard coded) MAC address with a new one (generated by the NLB software) and each node in the NLB cluster uses the same (virtual) MAC. Because of this virtual MAC being used by multiple computers, a switch is not able to learn the port for the virtual NLB cluster MAC and is forced to send the packets destined for the NLB MAC to all ports of a switch to make sure packets get to the right destination.

So, basically, the way NLB traffic is handled is kind of like this:

1. An inbound packet for IP address w.x.y.z (NLB Virtual IP) arrives
2. The ARP request is generated and is sent across all ports of the switch since there is no mapping at this point
3. All of the NLB cluster nodes respond with the same MAC
4. The switch sends the traffic to all ports because it is not able to tell which is the proper port and this leads to switch flooding

If an NLB cluster node is using unicast, NLB isn’t able to tell each node apart as they all have the same MAC. Since each NLB cluster node has the same MAC, communication between NLB cluster nodes is not possible unless each NLB cluster node has an additional NIC with a unique MAC.

Multicast – NLB adds a layer 2 MAC address to the NIC of each node. Each NLB cluster node basically has two MAC addresses, its real one and its NLB generated address. With multicast, you can create static entries in the switch so that it sends the packets only to members of the NLB cluster. Mapping the address to the ports being used by the NLB cluster stops all ports from being flooded. Only the mapped ports will receive the the packets for the NLB cluster instead of all ports in the switch. If you don’t create the static entries, it will cause switch flooding just like in unicast.

Flooding Solutions:

  1. Hook all NLB devices to a hub and then connect it to a port on the switch. Since all NLB nodes with the same MAC come through the same port, there is no switch port flooding.

  2. Configure a VLAN for all NLB cluster nodes to contain all NLB cluster traffic to just the VLAN and not run it over the entire switch.

  3. Use multicast and configure static mapping for the NLB cluster nodes in the switch so it only floods the mapped ports instead of the entire switch.

  4. Use port mirroring so that all ports involved in the NLB cluster mirror each other.

Costs of High Availability – Clustering Windows Server 2003 – Original Posted Jul 29, 2005

NOTE: For anyone looking for an actual cost, sorry, there isn’t anything in this blog entry about the actual dollars needed.


I am having a flash back today, it must be the new medication. 🙂


The costs of HA seems to be a normal topic of discussion when a company looks into clustering and has sticker shock. I can’t stress enough that clustering is not the end-all solution. Please do a quick read on my blog about my HA definition.


I was just talking to a client about how much clustering costs and how much the services cost to implement clustering. Yep, it isn’t the same as just installing a standard server and multiplying it times the number of nodes. Servers with large hard drives, lots of RAM and multiple processors have come down a great deal in the last couple of years. What used to be about the cost of a 700 series BMW is now about the cost of a Chrysler 300. Really. However, when you start talking about HA, you have much more than the costs of individual nodes in a cluster.


The main cost issue with clustering is the cost of the additional components that are needed above and beyond the nodes themselves. For example, I keep hearing the term “disk is cheap” bandied (I love that word) about in meetings. It isn’t true in all cases. Yes, a large hard drive is not that expensive. A LUN on a high-end SAN is expensive. It is even more expensive when you consider the initial costs of building the infrastructure to host that LUN.


OK, so back to the discussion of cost. Yes, clustering is costly, because it requires:

  • Windows Server 2003, Enterprise Edition which costs a good bit more than Standard Edition
  • Host Bus Adapters (two per server for redundancy) for the fiber fabric (yes, there are other less costly alternatives, but let’s stick to mainstream right now) and the software for the HBAs
  • Fiber switches
  • SAN devices (or NAS depending on the certification of the hardware)
  • Experienced administrators (if you want it done right) to design and configure it
  • A 24/7 team for maintaining it (remember HA is not just clustering)
  • Significant documentation (in case the administrator gets hit by a bus)
  • Tried and tested processes

To achieve High Availability, an organization must implement well defiined, planned, tested, and implemented processes, software, and fault tolerant hardware. The focus is application availability. Yes, this costs money.


My favorite sales person used to use this phrase a great deal when we would talk to clients and potential clients about HA, “How much does it cost for the application to be down?” If it doesn’t cost much, implementing clustering and instilling an HA attitude just might now be worth it. If they say it costs a fortune, then the response is simple, “if it costs you so much to be down, why are you sweating this relatively small amount to do the best job possible of keeping it up?”


I hate to think about how many organizations out there are gambling (yep, that is what it is) with their IT assets and the businesses that run on them. If your company will go out of business if an application fails, don’t you owe it to the owners to protect that application?


Can’t Send or Receive Email – Original Posted Apr 14, 2005

One of the most common posts when it comes to Exchange is a request for help troubleshooting Internet email traffic. This really isn’t that hard as it is almost always one of a few issues that are very easy to fix.

Can’t send email to the Internet – There are a few simple and easy steps to help identify the problem. It is almost always DNS or port 25 is blocked.

Log onto the Exchange server

Open a command prompt (no, it is not a DOS prompt) and type nslookup and press enter. You will then connect to a DNS server which will be shown along with its IP address. This is the DNS server that your Exchange server is configured to use because somebody configured the server to point to it. If it doesn’t work, then you found your problem. Here is what it should look like (or at least look similar):

Default Server:

> set type=mx

Non-authoritative answer:   MX preference = 10, mail exchanger =   MX preference = 10, mail exchanger =   MX preference = 10, mail exchanger =   nameserver =   nameserver =   nameserver =   nameserver =   nameserver =     internet address =     internet address =     internet address =     internet address =     internet address =     internet address =    internet address =    internet address =    internet address =    internet address =    internet address =

The writing in bold is what you should type. The information returned shows that Microsoft has three MX records (maila, mailb, and mailc) and each of the records point to multiple IP address (doh! it is a total of six actual server sharing the load) where Microsoft servers can accept mail.

After you have verified that your DNS works for outbound email, next you need to test that your can communicate from your server to systems on the Internet using port 25 (the port used by SMTP). Use telnet on port 25 to test connectivity.

C:\>telnet 25

Again, type the bold text at the command prompt and if everything works correctly, you should receive a response like this one from the remote server:

220 <Inbound SMTP Virtual Server> Thu, 14 Apr 2005 20:55:51 -0700

This shows that the remote mail server received your connection attempt and responded. Guess what that means? Yes, good guess if you said that it means that Exchange can see and connect to remote systems without anything blocking any ports (you are also right if you said that I like my eggs with bacon).

Yep, you just solved 90% of all outgoing email problems if the DNS resolution step failed (if it failed, fix it – duh!) or the connection failed using port 25 (again, fix it if it failed).

 You can fully test that the destination received your email by continuing with the following steps:

 After connecting using telnet to port 25, you can perform the following steps at the magic blinking cursor:

  1. Type HELO or EHLO (EHLO is for the enhanced version of SMTP) 
  2. Type MAIL FROM: 
  3. You should get a response message of “250 ok”
  4. Type RCPT TO:
  5. You should get a response message of “250 ok”
  6. Enter your subject info by typing Subject: Test Message (or whatever subject you want)
  7. Type DATA
  8. Enter your message
  9. Continue entering your message with multiple lines if needed
  10. End the message by put a period on a line by itself and hit enter one last time  

If all goes well, the recipient on the remote email server should get the message as if it were sent by an email client.

Can’t receive email from the Internet – There are a few simple and easy steps to help identify the problem.

From another system on the Internet, verify that remote systems are able to find your MX records for your company using the following:

Default Server:

> set type=mx

This should provide a response with the name and IP address of your email server. If it doesn’t, that that should tell you that I also like sausage with my eggs (and that your DNS server is either not properly configured on the Internet with proper MX records or DNS records for your company are not available at all.

If this does work, then you should (from a remote system on the Internet) try to telnet to your server on port 25 by typing:

C:\>telnet 25

If you get a nice response like this one,

220 Microsoft ESMTP MAIL Service, Version: 6.0.3790.211 ready at  Thu, 14 Apr 2005 22:40:02 -0600

then others can connect to your Exchange server without any problems. If it doesn’t work, then you probably have some firewall or router problems that need to be fixed.

Yes, it really is that simple. Remember SMTP stands for simple mail transfer protocol.


Clustering is not the solution all the time – Original Posted Feb 8, 2005

Everyone repeat after me, “I will not waste money trying to cluster every freaking service offered by Microsoft servers.”

I just feel the need to scream this out loud today. That there are often simpler and easier ways to provide redundancy for some services than using Microsoft server clusters. A couple of quick examples (please write these down) include:

  • Domain Controllers (and Global Catalog servers, too) – It is simple. Install more than one. If one goes down, you still have more. Yes, users can log into a different DC. Yes, users can even log into a different DC in a different physical location. FSMO roles can be seized if the holder of the roles falls out of the server rack and catches on fire.
  • DHCP – THere are lots of great articles on how to split DHCP scopes among more than one server so if one server fails, clients can still get IP addresses.
  • WINS – Think: Primary and Secondary. If the secondary fails, nobody will care or notice (unless it is engulfed in flames). If the primary fails, systems will start using the secondary.
  • DNS – uummmm, use Active Directory integrated DNS. It doesn’t get much more HA than that.
  • Yes, the list goes on and on, but here are some common ones that I see that just make my blood boil

You don’t need multiple nodes and a SAN to provide redundancy all the time. Remember, your company has other needs than have 8 way SMP WINS clusters with 4 TB of shared storage. Please, save a little money and please use some common sense.

Exchange Cluster – Is Active/Active worth it? – Original Posted Feb 5, 2005

This is kind of a regurgitation of a couple of threads on the newsgroup. In the threads, there are questions regarding the whole Active/Active issue. Several people, including a couple of good friends and a couple of top-notch Microsofties pointed out the evils of Active/Active. To be clear, Microsoft supports A/A for Exchange but does not recommend it. Best practices are developed based on the experiences of Microsoft’s internal usage (often referred to as eating their own dog food), the early deployment programs, and through trouble reports and the experiences of many customers as reported and tracked though PSS.

Over the years, I have explained to my students that Active/Passive is the best practice when it comes to clustering Exchange. Almost always a student will protest stating that their managers and others don’t want a wasted node so they want to know why A/A is such a problem. I point out that the store.exe is well known for sucking up all the RAM it can get. So, if you have two servers (node1 and node2) both running store.exe and consuming a very large amount of RAM on each node, then you can expect problems with the failover of one resource hog to a node where another resource hog lives. According to all of the literature, the store.exe on the surviving node should give up enough memory for the store.exe on the failing node to exist along with it as both store.exe’s will basically drain down (this is a real high level summary and not the term normally used, but I think it helps to understand what is happening) so they will both have smaller memory footprints and can coexist. In practice, this process is less than smooth. Another concern that is well documented is that if both Exchange Virtual Servers (EVSs) life on the same node, their stores and storage groups add together and apply to constraints in Exchange. For example, if EVS1 has three storage groups and EVS2 has three storage groups, when you combine them, they exceed the limits for Exchange (a max of 4 storage groups) and they both will not function on the same node.

Anyways, the issue in this discussion was around performance. With two active nodes, their memory, their CPUs, and their disk spindles should (according to some basic logic) provide better overall performance than one active node with the same resources. At first glance this makes a great deal of sense.

When you dig deeper, this common sense stops making sense. Wow, did I just type that? Try to follow me here (it should be easy, I am a pretty big guy).

According to Microsoft, if you use Exchange in a best practice configuration, you should manage the resource consumption so that you don’t exceed 80% of CPU. This is for a single server. If you consider that two nodes are active in an A/A cluster, and since there is a need to failover to a single node, then in order to maintain best practice configurations each node should be only utilized up to 40% of CPU utilization. This is basic math in that 40+40=80. This is discussed in 815180 here;en-us;815180&product=exch2003. This article also discusses the limit of 1,900 concurrent users per node. This article, however, doesn’t address the added scalability of multiple server backplanes, multiple fiber adapters, and multiple spindles. So, the argument then becomes, do you really get enough benefit out of the additional I/O provided with an A/A cluster while still strictly limiting CPU? I would probably go so far as to say yes, but only because it is very clear from all of the Exchange work that I have done that the disk I/O is the limiting factor for higher performance.

So to summarize the arguments/discussions:

Pro A/A

  • Provides greater disk I/O, but this is assuming that you would not use the same number of spindles for a similar A/P configuration. I am not sure if this is a fair assumption. I can say from experience, it is easier to ask for more spindles from the storage group when you have multiple active nodes. I don’t feel that most organizations would find using the increased number of spindles for an A/P configuration to be within reason.
  • Provides more RAM for two store.exe’s when not in a failed state which results in better performance.
  • Provides greater throughput using additional HBAs and additional server backplanes when not in a failed state which results in better performance.

Pro A/P

  • Provides the same CPU as an A/A based on the 40/40 rule previously discussed.
  • Provides the same performance whether in a failed state or not.
  • Is a best practice, and is the recommended configuration.

There are other issues to consider, for example:

  • Does the inter-Exchange messaging (email from node1 to node2) and the loss of single instance storage override any performance gains from A/A?
  • With two A/A nodes fully subscribed at 40% of CPU, are they hitting I/O bottle necks and thus utilizing the additional RAM, additional spindles, and the greater backplane bandwidth?
  • Is there a tendency to over subscribe the two A/A nodes in most organizations so they are well over 40% CPU utilization?
  • Is there a tendency to over subscribe a single node in A/P as well?

The reason I bring up this whole topic in this blog entry is that the A/A vs A/P issue really isn’t as cut and dried as many of us would like to believe.

However, it is strongly discouraged for a reason, and those reasons are because of HA impacts. Therefore, Active-Active is evil.  🙂

OK, now I am ready to start fresh

I finally decided that MSN Spaces was too structured for me, and it has has serious performance problems for me. So, I just spend the last 30+ minutes copying all of my old posts from MSN Spaces to here.

OK, no, I didn’t copy them all. I left some of the personal ones, like my cigar ones, on the old blog and did not bring them here.

I think, for the near future, I will maintain both blogs, but will keep my personal bits of stuff over on MSN and try to keep this blog focused on professional stuff only.

E-mail Reputation Score – Original Posted Aug 3, 2005

From the MS Exchange Blog post by Chris Meirick…


You can send email to and receive a reputation score that evaluates the possibility that your email is spam. I highly recommend that you read Chris’ blog entry for more information about Ironport and his evaluation of the product.


Based on his evaluation, I have to say that Ironport looks like a good product and it should be strongly considered for email filtering.

Exchange Server 2003 /3GB and /USERVA=3030 – Original Posted Jul 13, 2005

One more time…


I am sure it has been posted before, but it doesn’t seem to be getting out there to everyone. Hopefully this will hit at least one person that hasn’t read it.


There are many documents out there that say if your Exchange Server 2003 server has 1 GB of RAM or more, you should edit your boot.ini to include the /3GB and /USERVA=3030 switches in the boot configuration. What seems to get missed is that you should only do this if the Exchange server is a:

  • Mailbox Server

  • Public Folder Server

  • Connector Bridgehead (MTA, x.400, GroupWise, Notes, etc)

  • SMTP Gateway/Bridgehead (only when using Envelope Journaling – otherwise don’t use the switches)

You should NOT use these switches if the Exchange server is a:

  • Front End Server

  • SMTP Gateway/Bridgehead (see exception above)

What it really comes down to is that the store.exe benefits from these switches, and Front End and SMTP Gateway/Bridgehead servers don’t utilize the store.exe. The exception is when using Envelope Journaling because it does use the store.exe.


In all cases, you should NOT use these switches if your server has less than 1GB of RAM.


For more information on the /3GB and /USERVA switches, refer to KB 823440 and KB 810371.