Category Archives: 1310

Exchange Server 2007 & CluAdmin/Cluster.exe Bad Things can Happen?

My good buddy Scott Schnoll attempts to clear up some confusion on when and where to use Cluster.exe/CluAdmin to move a CMS. Find his post here:


 http://msexchangeteam.com/archive/2007/10/22/447317.aspx


Here is the comment I left: 



The samples you sited are for creating or managing a cluster, none of them are for moving a CMS.


Using CluAdmin/Cluster.exe vs. EMC/EMS with SP1 require different security levels by default and may not be the same individual. Now we will have to give Cluster Administrator Exchagne rights to avoid “Bad things Happening”…


Thankfully SQL and other Microsoft produts don’t have an issue with CluAdmin/Cluster.exe. Nor do they have any other way to manage them, K.I.S.S. in action.


Thanks for the post Scott!

Exchange Server 2007 SCC/CCR lessons learned

This past weekend I ran into a few issues with Exchange Server 2007 and wanted to share, so anyone with them won’t have to call Microsoft PSS and go through the fun (ok, not really fun…) that I went through.


Partition in Time with CCR

You have a partition in time, but what does that mean. You lost a node or the witness, and while that
was happening the remaining node/witness thought a change was made. When the
down node/witness came back it detected that a change has occurred and
killed the entire cluster. This is by design.
Now, how do you fix it?

http://support.microsoft.com/kb/258078 ForceQuorum section:


Function: When you use a Majority Node Set (MNS) quorum model on a Windows
Server 2003 cluster, in some cases a cluster must be allowed to continue to
run even if it does not have “quorum” (majority). Consider the case of a
geographically dispersed cluster with four nodes at the “primary” site and
three nodes at the “secondary” site. While there are no failures, the
cluster is a seven-node cluster where resources can be hosted on any node,
on any site. If there is a communications failure between the sites or if
the secondary site is taken offline (or fails), the primary site can
continue because it will still have quorum. All resources will be re-hosted
and brought online at the primary site.


In the event of a catastrophic failure of the primary site, however, the
secondary site will lose quorum, and, therefore, all resources will be
terminated at that site. One of the primary purposes for having a multi-site
cluster is to survive a disaster at the primary site; however, the cluster
software itself cannot make a determination about the state of the primary
site. The cluster software cannot differentiate between a communications
failure between the sites and a disaster at the primary site. That must be
done by manual intervention. In other words, the secondary site can be
forced to continue even though the Cluster service believes it does not have
quorum. This is known as forcing quorum.


Because this mechanism is effectively breaking the semantics associated with
the quorum replica set, it must only be done under controlled conditions. In
the example above, if the secondary site and primary site lose communication
and an administrator forces quorum at the secondary site, resources will be
brought online at BOTH sites, thus allowing the potential for inconsistent
data or data corruption in the cluster.


Requirements:
Forcing quorum is a manual process that requires that you stop
the Cluster service on ALL the remaining nodes. The Cluster service must be
told which nodes should be considered as having quorum.


Usage scenarios:
Special care must be taken if and when the primary site
comes back because the nodes are configured as part of the cluster. While a
cluster is running in the force quorum state, it is fully functional. For
example, nodes can be added or removed from the cluster; new resources,
groups, and so forth can be defined.


Note
The Cluster service on all nodes NOT in the force quorum node list must
remain stopped until the force quorum information is removed. Failure to do
so can lead to data inconsistencies OR data corruption.


Operation:
Set up the Cluster service startup parameters on ALL remaining
nodes in the cluster. This is done by starting up the Services control
panel, selecting the Cluster service, and then entering the following in the
Start parameters option:
net start clussvc /forcequorum node_list
For example, if the secondary site contains Node5, Node6, and Node7, and you
wanted to start the Cluster service and have those be the only nodes in the
cluster, use the following command:
net start clussvc /forcequorum /forcequorum node5,node6,node7
Note There should be no spaces in the key (except where there are spaces in
the node names themselves).

The only problem I could not get the above commands to work on a 64-bit Windows Server 2003 R2, Enterprise Edition SP2 machine. I most got invalid syntax. Here is what PSS told me to do:



1.    We shutdown one of the nodes, a true power off. We will call this the passive node.
2.
   
We added the following value to this registry key on the surviving node (active node):


HKLM/System/CurrentControlSet/Services/Clussvc/Parameters

Value: ForceQuorum

Type: REG_SZ

Data: nodenamea
3.    Replace nomenamea with the machines name, such as exch2007nodea – where this is the node that is currently running.
4.
   
We attempted to start the cluster service on the active- surviving node and it started.
5.
   
We then stopped the cluster service on the active – surviving node and added nodenameb to the ForceQuorum data value on the surviving node.
6.
   
We restarted the powered off (passive) machine.
7.
   
We then started the cluster service on the active node and it started. The registry with the ForceQuorum containing both node names.
8.
   
We attempted to start the cluster service on passive (with no parameters or registry changes) and it started.
9.    We verified that the Cluster group resources were online.
10.
  
Undo the registry changes by deleting the ForceQuorum key from the Active node.

Exchange Server 2007 System Attendant fails to come online within a CCR/SCC cluster

After the cluster was up and running, the Exchange SA was not. Looking in the Application event log and we were getting the following errors with regards to the Exchange SA failing to start:

Event ID 1011, 1030, 1003, and 1019 errors.


We found that a bug exists where the Exchange SA times out after 40 seconds when the default of 180 seconds is used for the resource.


We changed the value to 179 and the Exchange SA resource came online. This is scheduled to be fixed in SP1. This bug was confirmed for SCC & CCR Exchange Server 2007 Clusters.
 

Update from PSS – find a link to the first issue here http://technet2.microsoft.com/WindowsServer/f/?en/library/e70333db-5048-4a56-b5a9-8353756de10b1033.mspx, we are still waiting on the KB to be updated though.

Observations about the software industry today

Sometimes I think that the movie Conspiracy Theory should have been about the software industry today. What has become of it lately? Here is what I believe:


·         I believe the Anti-Virus companies write all the viruses.


·         I believe most software is way over priced.


·         I believe we now alpha test software for vendors


·         I believe we beta test when service pack 1 comes out.


·         I believe 1.0 is not the standard to avoid, RTM (release to manufacturing/gold code) is.


·         I believe we get the final, ready for the world product when service pack 2 comes out.


·         I believe most software has too many features for 98% of the users.


·         I believe all the added features cause 100% of the problems with software today.


·         I believe it is better to update software then to design it properly to begin with.


·         I believe we, the paying consumer, don’t complain enough so things are only going to continue to get worse.


·         I believe you pay 10 times the cost of software in support costs and lost productivity when it does not function properly out of the box.


·         I believe the world has become too computer savvy because of buggy software.


·         I believe a computer should be just another asset at the office place, taken for granted like a stapler or pencil.


·         I believe that a computer isn’t taken for granted because broken things always get attention and notice.


·         I believe release dates are based upon dates on PowerPoint slides, not when the product is anywhere near being ready or bug free.


 

Exchange Server 2007 MCP Exams – Notes from the field

I love taking Microsoft exams because I learn so much. I learn what Microsoft feels are the important product features that everyone show know. I learn different ways to do common tasks within the product, let’s face it sometime we only know as much as our peers. I also learn exactly where I stand on the product, and what I really need to work on.


As I get older though I am either getting smarter or lazier, take your pick. I simply don’t study for the exams anymore. Sorry, but I don’t. I take the exam to learn the question format, style, content, and lastly to gage what if anything I need to study. I recently did this for the 3 (yes I said 3) exams that relate to Exchange Server 2007. I would now like to break down what took place without breaking my NDA.


70-236 TS: Exchange Server 2007 Configuration


This is a fun exam. Honest, it is. I would recommend this as the second exam in this series. I walked in to take my practice version and almost pasted. Lots of PowerShell (Exchange Management Shell – EMS). I failed my first attempt by 2 questions. I needed more Edge server information. I need to learn more PowerShell cmdlets, like anything test-*. I did not feel the test was worded poorly nor had any long questions. Either you knew it or you blew it.The second time I took this I studied:

  • test-* cmdlets

  • Microsoft Search service repair

  • DR repair and movement of Hub Transport logs

  • Edge Configuration cmdlets

  • General EMS syntax
I passed my second attempt because of the above and the fact that I could relax knowing I had plenty of time to take the exam and concentrate on the PowerShell questions. All and all it’s a fair exam. My only problem is that I suck at PowerShell/EMS, honestly. After the exam I wanted to recreate some of the cool ones the test went over and I could not do the syntax. It is one thing to see 4 or 5 various ways to attempt to do a command, easy pick the one that works. Now, try and do that without the spoon feeding. The help files are ok, but I need more examples to choose from, like on the exam.
70-237 Pro: Designing Messaging Solutions with Microsoft Exchange Server 2007
This exam is trying to test if you fully understand all the concepts of Exchange Server 2007 design. I passed with flying colors on my first attempt – without a lick of studying. The questions were very cut and dry, with usually only 1 glaring answer. I would definitely start by taking this exam! It is a very fair exam.
70-238 Pro: Deploying Messaging Solutions with Microsoft Exchange Server 2007
OUCH! Make this your last exam and do yourself a favor, study! This one got to me, deep inside it hurt, and badly. My first attempt I failed by 3 questions, but I did not feel I was really that close. This is a wonderfully well rounded exam. From soup to nuts you need to learn it all to have any chance. This is a VERY wordy exam; several questions were a good two pages. Tons of reading. I took 90% of the time to complete it. Time was an issue and I pushed myself at the end, I regret doing that.The second time I took it I studied:

·         Edge Configuration


·         Backup/DR scenarios – incremental vs. differential


I passed my second attempt and almost jumped for joy when I read the word passed. You need to know Exchange from top to bottom for this exam.  I had Novell questions, Security Configuration Wizard, GPO, IPSec, VPN, IBE, Hosted Services, and tons of CCR vs. LCR vs. SCC questions. I found the wording VERY difficult. As a clustering MVP I still had a very difficult time with the HA questions. I knew every word, but not the way it was worded. This is a VERY wordy exam; several questions were a good two pages. Tons of reading. I took 95% of the time to complete it. 95%! Dang! Time was not really an issue though, because I knew I would finish with a few minutes to spare. The timing is very close, but you will finish.

So what does it all add up to?

In the end, assuming you pass all three exams, you get two new classes of certifications. MCSE is gone (long live the MCSE), it has been replaced by MCITP – Microsoft Certified IT Professional (all three exams are required). Any certification with IT in it is silly in my eyes. MCP has really been replaced with MCTS – Microsoft Certified Technology Specialist. After your pass the 70-236 exam you are a TS. Here are the official titles cut from my official Microsoft Transcript.

Microsoft Certified IT Professional






Microsoft Exchange 2007 Messaging Solutions Administrator

Microsoft Certified Technology Specialist





Microsoft Exchange 2007: Configuration

Hello Microsoft Certification, the product is called Microsoft Exchange SERVER 2007. I think you left off a word. Strange! And what gives with the Solutions Administrator, I take a Design and Deploy exam but I can only administer? Sounds more like an Architecture cert to me.Anyways, I am done rambling here, good luck on your exams, study and enjoy! Drop me a line when you pass them.

 

A few tips for Exchange Server 2007 SCC & CCR clustered installations

Have you tried to install Exchange Server 2007 yet? Clustering is different. First of all Exchange Virtual Servers (EVS) are gone, replaced with Clustered Mailbox Server (CMS). You now have two options for clustering, Single Copy Cluster (meaning one CMS per server really) and it is not the default installation. The default is now Continuous Cluster Replication (CCR) which is something new to Exchange Server 2007,


You still install and configure clustering first, then install Exchange. But passive and active nodes are supposed to be handled differently now. And active/active clustering is simply not allowed anymore [:D]  I would use setup.exe or the GUI to install both nodes as passive. This will put the Exchange bits on machine, but won’t create the CMS just yet. From the command prompt you would use something like this syntax:

setup.com /mode:install /roles:mb

Which would should come back with:


Welcome to Microsoft Exchange Server 2007 Unattended Setup Preparing Exchange Setup    
Copying Setup Files              ……………………. COMPLETED
 
The following server roles will be installed
    Management Tools
    Mailbox Role
 
Performing Microsoft Exchange Server Prerequisite Check
    
Mailbox Role Checks              ……………………. COMPLETED
 Configuring Microsoft Exchange Server    

Copying Exchange files           ……………………. COMPLETED
    Mailbox Server Role              ……………………. COMPLETED

 The Microsoft Exchange Server setup operation completed successfully.

Do this for both nodes. Then on the Active node you will need to run exsetup.exe. Why? Because of this little gem in the books online:



If you already have one or more server roles installed on a computer, you cannot use the Exchange Server 2007 Setup wizard or the Setup.exe command to add or remove server roles. Instead, you must use the ExSetup.exe command.


DOH! Not knowing this means you can try Setup.exe until you are blue in the face and never get Exchange clustered.


So the ExSetup.exe to create the CMS is (replace CMSNAME with a real meanful name, replace cip with a real IP):


exsetup /mode:install /clustered /cn:CMSNAME /cip:198.168.1.100


Which would should come back with:



Welcome to Microsoft Exchange Server 2007 Unattended Setup
No server roles will be installed
   
Clustered Mailbox Server

 
Performing Microsoft Exchange Server Prerequisite Check
 
Configuring Microsoft Exchange Server
    
Clustered Mailbox Server         ……………………. COMPLETED
 The Microsoft Exchange Server setup operation completed successfully.

This should create all the clustered resource Exchange needs.


Why not install the bits and create all the clustered resources together with the Setup.exe command? The syntax would look like this:


setup.com /mode:install /roles:mb /newcms /cn:CMSNAME /cip:198.168.1.100


If you Active Directory is large it could take a little bit for the CMS to be registered properly, in that case you might get an error this like:



Welcome to Microsoft Exchange Server 2007 Unattended Setup



No server roles will be installed


Clustered Mailbox Server
 Performing Microsoft Exchange Server Prerequisite Check    
Clustered Mailbox Role Checks    ……………………. COMPLETED
 Configuring Microsoft Exchange Server    

Clustered Mailbox Server         ……………………. FAILED


     The computer account ‘CMSNAME’ was created on the domain controller ‘\\dc02.clusterhelp.ad’, but has not replicated to the desired domain controller (dc01.clusterhelp.ad) after waiting approxmately 60 seconds. Please wait for the account to replicate and re-run exsetup /newcms.
 
The Exchange Server setup operation did not complete. Visit http://support.microsoft.com and enter the Error ID to find more information.

So, you only get 60 seconds for AD to fully replicate – DOH! Interesting. See AD was using our DNS (dc02) on the Public, but the Exchange was on the private near dc01. The spelling error is Microsoft’s by the way, not mine. You can add /dc:dc02.clusterhelp.ad but setup will fail if the DNS is not on the same AD site as the Exchange server. You will see this error:



Setup cannot use domain controller ‘dc02.clusterhelp.ad’ because it belongs to Active Directory site ‘Public’. Setup must use a domain controller in the same site as this computer (Private).


Other DNS messages you might get:



Exchange setup cannot continue because DNS information for the clustered mailbox server “CMSNAME” has not finished replicating. Please run setup again after replication has completed. After replication has completed, the command  “nslookup CMSNAME” should succeed.


This one again means you need to let it bake some more, let DNS replicate. Check your event logs. In some environments, you might have to wait 30 to 45 minutes for things to settle down. Just rerun the exsetup command again.


Lastly you might get this error:



Error:
Error of unknown type occured while performing exsetdata operation; the original error code was 0xc103fd2c


This means might have a duplicate DNS record that needs to be removed before CMSNAME be created. Check your event logs for the exact error and believe them! The spelling error is again Microsoft’s not mine. If Exchange can’t create the record look for duplicates.


If after all that you just want to give up, the command would look like this:


exsetup /mode:uninstall /removecms /cmsname:CMSNAME


So why am I only showing you the command line for ExSetup.exe? Because you can only run the GUI seutp once, after that you have to use ExSetup.exe.


Good luck, check your event logs and hopefully this will help someone else out.


Windows Server Codename "Longhorn" Beta 3 is out

But if you want to test failover clustering….


Standard SCSI based clustering will no longer work (yes I tested it, no go). 


Rocket Division StarWind will be the product you want to use to test Failover Clustering. http://www.rocketdivision.com/wind.html You will have to wait until the June 2007 time frame while they make it work with Beta 3.


FalconStor iSCSI Storage Server http://www.falconstor.com/en/solutions/?pg=Products&sb=iSCSI I am not sure when they will support it.


Microsoft bought StringBean (http://www.stringbeansoftware.com/downloads_update2_0.asp) which has a great WinTarget program. The program is now part of Windows Unified Data Storage Server and it called Microsoft iSCSI Software Target. That one works today, if you are lucky enough to have a copy.

Clustering terms made easy

Clusters are Highly Available and should never be considered Fault Tolerant.

Highly Available = is when I come anytime my wife calls me.
Fault Tolerant = Marriage.

You don’t want to be married to you SQL/Exchange Cluster 🙂 You do want it around whenever you need it though.


Active/Active = when your cluster is too busy for its own good.
Active/Passive = one worker, one manager, you decide which is which.
Node = Clustered computer, could also be the worker who sits in a cube, not to be confused with Dude.
Virtual Server = this is kind of like be on a telecom at work, only you are calling in from Hawaii and nobody knows.
Quorum = Cluster=Quorum, Quorum=Clustering.
Failover = the only time at work that you can fail and still be a hero.
Failback = great way to get fired, let your server failover without you controlling it (Don’t confuse with the above term).
Cluster = when it fails, also known as a Cluster Fork, only fork is spelled funny – u c what I mean?


Yes, I know, this post should have been on April 1st [:)]


 

Your Cluster won’t start and you can’t get into Cluster Administrator

Don’t panic, but you do have a problem. You look under Services and the Cluster service is running on the nodes. Open Cluster Administrator from one of the nodes, but don’t use the name of the cluster, the node name or an IP address, use a period (.). This will open it using a Local Procedure Call (LPC) and not a Remote Procedure Call (RPC). See Cluster Administrator Switches for Connecting to a Cluster for complete switch details.


Now with Cluster Administrator open, expand the Cluster group and attempt to start any failed resources. Whatever keeps failing should give a clue as to what is broken, disk, IP, network name, etc.

Windows Server 2003 Service Pack 2 – the good, the bad, and the ugly

By now you are aware that SP2 is out for Windows Server 2003 and R2. I have already seen quite a few posts in the public newsgroups where people are not aware of a few things.


The Good


It is a really good idea to backup your system before you attempt any major update or service pack. Remember your backup is only as good as your restore. Or as Geoff N. Hiten SQL MVP and SQL God says “Until you test a backup by restoring, you don’t have recovery plan, you have a recovery hope.”


Another good idea, read the readme or release notes. They can be found here Release Notes for Microsoft Windows Server 2003 Service Pack 2.


Next and maybe most important is to check with your hardware vendor before installing. They might not support it yet. For instance Dell says they don’t have any reported issues, but EMC (A Dell partner) does not support it yet.


The Bad


You will have to take an outage to install this on your cluster. Please read How to install service packs in a cluster. Please note that this document has not been updated for SP2 yet. I won’t bring up the mess everyone had to deal with on Dell’s for SP1 (see http://www.dell.com/downloads/global/power/ps2q05-20050113-Callaway.pdf or better yet http://msmvps.com/blogs/clustering/archive/2005/06/28/56167.aspx).


The Ugly


I remember growing up as a kid and going to Circus World and getting a small usually square white box with question marks all over it. They sold these for $1.49 – $2.99. The more you paid, the better the crap they put in. Installing Service Packs from just about any vendor is exactly like this. You really never know what they have changed, thrown in, made better or worse.


Always test a service pack install on a test cluster. Bad things happen when you test with production servers.


 


Good luck and keep me posted on your experiences.

What I learned at the MVP Summit last week – and I can tell you about!

The 2007 MVP Summit in Redmond/Seattle was awesome. Lots of great content. Here is what I learned:


1) SQL 2005 can cluster with Standard Edition (ok I knew that). What I learned was that you can have multiple SQL instances (that run on 2 nodes only) within any size cluster. So, if you have a 4 node cluster you can run 2-3 instances with multiple copies of SQL 2005 Standard Edition. Cool! Think of the savings.


2) Exchange 2007 Clustering is way cool and different, but in a good way.


3) Windows Server 2003 can support GPT – http://support.microsoft.com/kb/919117 – and volumes larger then 2 TB in size!!! Rumor has it Longhorn will also allow this.


4) Windows Server 2003 SP2 did not add anything compelling for a cluster. Check with your hardware vendor before deploying to ensure full support.


5) Windows 2000/2003 clusters can restart resources after a set amount of time – http://support.microsoft.com/kb/228923. This KB is for 2000 and 2003, I know it’s old but I just found out about this feature (I was not the only Clustering MVP that had no idea).


6) Windows Server Codename “Longhorn” Failover Clustering will be very different, but again in a very good way. Come to my Tech Ed 2007 session in Orlando, Florida to learn a lot more!