Exchange Server 2007 & CluAdmin/Cluster.exe Bad Things can Happen?

My good buddy Scott Schnoll attempts to clear up some confusion on when and where to use Cluster.exe/CluAdmin to move a CMS. Find his post here:


 http://msexchangeteam.com/archive/2007/10/22/447317.aspx


Here is the comment I left: 


The samples you sited are for creating or managing a cluster, none of them are for moving a CMS.

Using CluAdmin/Cluster.exe vs. EMC/EMS with SP1 require different security levels by default and may not be the same individual. Now we will have to give Cluster Administrator Exchagne rights to avoid “Bad things Happening”…

Thankfully SQL and other Microsoft produts don’t have an issue with CluAdmin/Cluster.exe. Nor do they have any other way to manage them, K.I.S.S. in action.


Thanks for the post Scott!

Exchange Server 2007 SCC/CCR lessons learned

This past weekend I ran into a few issues with Exchange Server 2007 and wanted to share, so anyone with them won’t have to call Microsoft PSS and go through the fun (ok, not really fun…) that I went through.


Partition in Time with CCR

You have a partition in time, but what does that mean. You lost a node or the witness, and while that
was happening the remaining node/witness thought a change was made. When the
down node/witness came back it detected that a change has occurred and
killed the entire cluster. This is by design.
Now, how do you fix it?

http://support.microsoft.com/kb/258078 ForceQuorum section:


Function: When you use a Majority Node Set (MNS) quorum model on a Windows
Server 2003 cluster, in some cases a cluster must be allowed to continue to
run even if it does not have “quorum” (majority). Consider the case of a
geographically dispersed cluster with four nodes at the “primary” site and
three nodes at the “secondary” site. While there are no failures, the
cluster is a seven-node cluster where resources can be hosted on any node,
on any site. If there is a communications failure between the sites or if
the secondary site is taken offline (or fails), the primary site can
continue because it will still have quorum. All resources will be re-hosted
and brought online at the primary site.


In the event of a catastrophic failure of the primary site, however, the
secondary site will lose quorum, and, therefore, all resources will be
terminated at that site. One of the primary purposes for having a multi-site
cluster is to survive a disaster at the primary site; however, the cluster
software itself cannot make a determination about the state of the primary
site. The cluster software cannot differentiate between a communications
failure between the sites and a disaster at the primary site. That must be
done by manual intervention. In other words, the secondary site can be
forced to continue even though the Cluster service believes it does not have
quorum. This is known as forcing quorum.


Because this mechanism is effectively breaking the semantics associated with
the quorum replica set, it must only be done under controlled conditions. In
the example above, if the secondary site and primary site lose communication
and an administrator forces quorum at the secondary site, resources will be
brought online at BOTH sites, thus allowing the potential for inconsistent
data or data corruption in the cluster.


Requirements:
Forcing quorum is a manual process that requires that you stop
the Cluster service on ALL the remaining nodes. The Cluster service must be
told which nodes should be considered as having quorum.


Usage scenarios:
Special care must be taken if and when the primary site
comes back because the nodes are configured as part of the cluster. While a
cluster is running in the force quorum state, it is fully functional. For
example, nodes can be added or removed from the cluster; new resources,
groups, and so forth can be defined.


Note
The Cluster service on all nodes NOT in the force quorum node list must
remain stopped until the force quorum information is removed. Failure to do
so can lead to data inconsistencies OR data corruption.


Operation:
Set up the Cluster service startup parameters on ALL remaining
nodes in the cluster. This is done by starting up the Services control
panel, selecting the Cluster service, and then entering the following in the
Start parameters option:
net start clussvc /forcequorum node_list
For example, if the secondary site contains Node5, Node6, and Node7, and you
wanted to start the Cluster service and have those be the only nodes in the
cluster, use the following command:
net start clussvc /forcequorum /forcequorum node5,node6,node7
Note There should be no spaces in the key (except where there are spaces in
the node names themselves).

The only problem I could not get the above commands to work on a 64-bit Windows Server 2003 R2, Enterprise Edition SP2 machine. I most got invalid syntax. Here is what PSS told me to do:


1.    We shutdown one of the nodes, a true power off. We will call this the passive node.
2.
   
We added the following value to this registry key on the surviving node (active node):

HKLM/System/CurrentControlSet/Services/Clussvc/Parameters

Value: ForceQuorum

Type: REG_SZ

Data: nodenamea
3.    Replace nomenamea with the machines name, such as exch2007nodea – where this is the node that is currently running.
4.
   
We attempted to start the cluster service on the active- surviving node and it started.
5.
   
We then stopped the cluster service on the active – surviving node and added nodenameb to the ForceQuorum data value on the surviving node.
6.
   
We restarted the powered off (passive) machine.
7.
   
We then started the cluster service on the active node and it started. The registry with the ForceQuorum containing both node names.
8.
   
We attempted to start the cluster service on passive (with no parameters or registry changes) and it started.
9.    We verified that the Cluster group resources were online.
10.
  
Undo the registry changes by deleting the ForceQuorum key from the Active node.

Exchange Server 2007 System Attendant fails to come online within a CCR/SCC cluster

After the cluster was up and running, the Exchange SA was not. Looking in the Application event log and we were getting the following errors with regards to the Exchange SA failing to start:

Event ID 1011, 1030, 1003, and 1019 errors.


We found that a bug exists where the Exchange SA times out after 40 seconds when the default of 180 seconds is used for the resource.


We changed the value to 179 and the Exchange SA resource came online. This is scheduled to be fixed in SP1. This bug was confirmed for SCC & CCR Exchange Server 2007 Clusters.
 

Update from PSS – find a link to the first issue here http://technet2.microsoft.com/WindowsServer/f/?en/library/e70333db-5048-4a56-b5a9-8353756de10b1033.mspx, we are still waiting on the KB to be updated though.

Observations about the software industry today

Sometimes I think that the movie Conspiracy Theory should have been about the software industry today. What has become of it lately? Here is what I believe:


·         I believe the Anti-Virus companies write all the viruses.


·         I believe most software is way over priced.


·         I believe we now alpha test software for vendors


·         I believe we beta test when service pack 1 comes out.


·         I believe 1.0 is not the standard to avoid, RTM (release to manufacturing/gold code) is.


·         I believe we get the final, ready for the world product when service pack 2 comes out.


·         I believe most software has too many features for 98% of the users.


·         I believe all the added features cause 100% of the problems with software today.


·         I believe it is better to update software then to design it properly to begin with.


·         I believe we, the paying consumer, don’t complain enough so things are only going to continue to get worse.


·         I believe you pay 10 times the cost of software in support costs and lost productivity when it does not function properly out of the box.


·         I believe the world has become too computer savvy because of buggy software.


·         I believe a computer should be just another asset at the office place, taken for granted like a stapler or pencil.


·         I believe that a computer isn’t taken for granted because broken things always get attention and notice.


·         I believe release dates are based upon dates on PowerPoint slides, not when the product is anywhere near being ready or bug free.


 

Exchange Server 2007 MCP Exams – Notes from the field

I love taking Microsoft exams because I learn so much. I learn what Microsoft feels are the important product features that everyone show know. I learn different ways to do common tasks within the product, let’s face it sometime we only know as much as our peers. I also learn exactly where I stand on the product, and what I really need to work on.


As I get older though I am either getting smarter or lazier, take your pick. I simply don’t study for the exams anymore. Sorry, but I don’t. I take the exam to learn the question format, style, content, and lastly to gage what if anything I need to study. I recently did this for the 3 (yes I said 3) exams that relate to Exchange Server 2007. I would now like to break down what took place without breaking my NDA.


70-236 TS: Exchange Server 2007 Configuration


This is a fun exam. Honest, it is. I would recommend this as the second exam in this series. I walked in to take my practice version and almost pasted. Lots of PowerShell (Exchange Management Shell – EMS). I failed my first attempt by 2 questions. I needed more Edge server information. I need to learn more PowerShell cmdlets, like anything test-*. I did not feel the test was worded poorly nor had any long questions. Either you knew it or you blew it.The second time I took this I studied:
  • test-* cmdlets
  • Microsoft Search service repair
  • DR repair and movement of Hub Transport logs
  • Edge Configuration cmdlets
  • General EMS syntax
I passed my second attempt because of the above and the fact that I could relax knowing I had plenty of time to take the exam and concentrate on the PowerShell questions. All and all it’s a fair exam. My only problem is that I suck at PowerShell/EMS, honestly. After the exam I wanted to recreate some of the cool ones the test went over and I could not do the syntax. It is one thing to see 4 or 5 various ways to attempt to do a command, easy pick the one that works. Now, try and do that without the spoon feeding. The help files are ok, but I need more examples to choose from, like on the exam.
70-237 Pro: Designing Messaging Solutions with Microsoft Exchange Server 2007
This exam is trying to test if you fully understand all the concepts of Exchange Server 2007 design. I passed with flying colors on my first attempt – without a lick of studying. The questions were very cut and dry, with usually only 1 glaring answer. I would definitely start by taking this exam! It is a very fair exam.
70-238 Pro: Deploying Messaging Solutions with Microsoft Exchange Server 2007
OUCH! Make this your last exam and do yourself a favor, study! This one got to me, deep inside it hurt, and badly. My first attempt I failed by 3 questions, but I did not feel I was really that close. This is a wonderfully well rounded exam. From soup to nuts you need to learn it all to have any chance. This is a VERY wordy exam; several questions were a good two pages. Tons of reading. I took 90% of the time to complete it. Time was an issue and I pushed myself at the end, I regret doing that.The second time I took it I studied:

·         Edge Configuration

·         Backup/DR scenarios – incremental vs. differential

I passed my second attempt and almost jumped for joy when I read the word passed. You need to know Exchange from top to bottom for this exam.  I had Novell questions, Security Configuration Wizard, GPO, IPSec, VPN, IBE, Hosted Services, and tons of CCR vs. LCR vs. SCC questions. I found the wording VERY difficult. As a clustering MVP I still had a very difficult time with the HA questions. I knew every word, but not the way it was worded. This is a VERY wordy exam; several questions were a good two pages. Tons of reading. I took 95% of the time to complete it. 95%! Dang! Time was not really an issue though, because I knew I would finish with a few minutes to spare. The timing is very close, but you will finish.

So what does it all add up to?

In the end, assuming you pass all three exams, you get two new classes of certifications. MCSE is gone (long live the MCSE), it has been replaced by MCITP – Microsoft Certified IT Professional (all three exams are required). Any certification with IT in it is silly in my eyes. MCP has really been replaced with MCTS – Microsoft Certified Technology Specialist. After your pass the 70-236 exam you are a TS. Here are the official titles cut from my official Microsoft Transcript.

Microsoft Certified IT Professional
Microsoft Exchange 2007 Messaging Solutions Administrator

Microsoft Certified Technology Specialist

Microsoft Exchange 2007: Configuration

Hello Microsoft Certification, the product is called Microsoft Exchange SERVER 2007. I think you left off a word. Strange! And what gives with the Solutions Administrator, I take a Design and Deploy exam but I can only administer? Sounds more like an Architecture cert to me.Anyways, I am done rambling here, good luck on your exams, study and enjoy! Drop me a line when you pass them.