So … it”s like this… I injured my RAID. I had a hardware issue.
The first clue that I wasn”t doing so well was when Susan woke up on Wednesday morning and realized that I wasn”t responding. She did a trace route using www.dnsstuff.com (which by the way if you use it too often they have smarted up and now charge a reasonable fee for their service) and thought I had a routing problem at the colo in Dallas. She probably thought floods and power problems… heck no… I had a bigger issue than that. When they got in to look with a kms connection I was in a looping bsod and indicating that the RAID hardware was not a happy camper. They called in the experts and they reported back….
Ok, so here is where we stand
a,,) the upload of the new driver didn”t make a difference, wasn”t a driver.
b) the bootup in the identical hardware didn”t make a difference, failed.
c) currently imaging drives and building a new system
dual core proc.
new box will be online within the hour or two, I don”t want to waste anymore time
Like all historical decisions with me, the folks that are the blog admins dilly dally and wait too long to make up their minds. When I was at Webhost4life and we decided I needed to move to a dedicated server with Vlad, we didn”t turn down the TTL value or nothin”, we just picked up the data and ran.
In this case, and I”m glad they did it, they knew they had a good enough backup (and well sort of with one exception) and went to a new server.
To get it back up and running having the NAS with the backups and spare copies of software was key. The SQL 2005 that had been used in the recent update was parked in a backup on the NAS. So they connected to it, and restored the SQL 2005 install parts that were needed and reinstalled the SQL. Restored the data files back into position and was good to go. The IIS was a little trickier as we found out – http://www.hintsandtips.com/ShowPost/416/hat.aspx . The ability to take a fully working blog site with lots of redirects (where when you type in www.msmvps.com/nameofblog and it goes to www.msmvps.com/blogs/nameofblog) and put it back operational just means a bit more work for my blog admins. It would have been nicer if they had that iismetabase backup a bit more portable. They were able to get back to the old metabase files, but couldn”t restore them. Oh well. Not a huge thing.
Also good that the server ”broke” right after the nightly backup. Currently the folks were doing a simple backup to save on drive space. With the new larger tummy server (and I do mean WAY larger tummy) they can now go back and redo the SQL backup strategy to better grab the posts. As it was we only lost about 9 posts not being backed up and only impacted about four blogger, three only had one post impacted, and the rest were easily put back.
Hardware issues are something I always feel so un-in-control about. They are so yucky. And the best thing you can do is have a backup.
So lessons learned?
We did okay on the blog backups.
We can tweak the SQL backups to capture more, especially now that we have room on the hard drive.
We need to do a portable backup of the IIS metadata.
It took longer to get back online than everyone was hoping, but it looks like the Search box works better. My blog admins were saying that they needed to uninstall the old SQL 2000…and well we got a nice clean install as a result.
Another lesson learned?
You need to see if you too can move a server from one server to another. Try to recreate what my blog admins just did. Can you do it? What parts do you need? What did you forget? If you don”t test a recovery, you”ll never find the gotchas that bite ya. There were a few gotchas like the metabase info that could have been better, and we had a smidge of delays getting access to the media for reinstalling (once again proving Mark Crall”s point about media being so important)
So…. to all those who pinged and asked how I was doing, thanks.
P.S. a HUGE thank you to Vlad Mazek of www.ownwebnow.com and www.exchangedefender.com who took the bull by the horns to get us safely back online.