The Savvy-Tech’s Hardware-Independent-Restore

Ok, so I thought I’d share a real-world support scenario that happened to me today:

So I have a new contract customer I just signed a couple weeks ago, and they went live as of 9/1.  I was doing various maintenance tasks on their network over the weekend, removing unnecessary apps from PCs to improve performance, getting patches installed, etc.  So about the only thing that was left last night was patching their Windows 2003 terminal server.  So I push the patches out via Kaseya, patches install successfully, the server initiates a reboot – but never comes back.  Now, I’ve been doing remote patching / reboots for years now, and this has only ever happened a handful of times.

I log in to their SBS and attempt to ping the TS – no response.  The TS is a whitebox server that is about 4 years old and doesn’t have a remote access card or IP-KVM connected.  The client is in bed, and not having the TS really isn’t going to be an issue until their approx half-dozen remote users try to access Great Plains in the morning. So I didn’t bother calling to wake anyone up – instead, I more or less surprised the VP when I walked in at 7:30 this morning to address a problem they didn’t know they had yet. 

Short story was that the server was pretty much on its deathbed – the alarm LED on the case was coming on whenever the processor tried to do anything.  It took 5 attempts before I was able to get the server to boot, and when I did get logged in the CPU was grinding constantly with the error LED lit, but looking at the task manager I didn’t see anything out of the ordinary, besides the fact that the system was so slow it was virtually unusable for one user at the console, let alone a half-dozen plus remote TS users trying to use Great Plains.  Quick diagnosis & gut instinct told me this was a hardware issue.  Being a 4-year old whitebox, it was long out of warranty.  I knew the server just needed replaced, but the remote users couldn’t wait a week to 10 days for me to get approval, get a box ordered from Dell, and get it installed.  Additionally, for the state of the machine, it would probably take days to get an image-based backup using Shadow Protect, since this is a new customer and they aren’t backing up the TS since there’s no data on it.

SO – I ran back to my office and grabbed a spare PC I use for random stuff on my bench (Acer – about 3 yrs old, but has a dual-core Pentium CPU @ 2.8 GHz & has been upgraded to 2GB RAM).  I also grabbed my old Adaptec 1205SA PCI SATA host controller off the shelf and returned to the client.

The TS in question was running a RAID 1 array using an on-board SATA RAID controller.  I shut down the TS, and installed the Adaptec SATA controller in an open PCI slot, then after 4 tries the server finally booted again.  I logged in, the OS found the Adaptec SATA controller & I installed drivers from my thumb drive.  Once the driver installation completed successfully, I shut down the TS again.

I removed the Adaptec SATA controller & drive 0 from the TS.  I installed the Adaptec SATA controller in the Acer PC, inserted & connected drive 0 from the TS to the Adaptec SATA controller, then disconnected the existing SATA hdd in the PC.  I powered-on the PC, and since drive 0 was connected to the Adaptec SATA controller, AND the Win2k3 OS on drive 0 already had drivers for that controller installed, the Win2k3 TS OS booted successfully in (almost) completely different hardware.  On the first login, the OS detected the various new hardware (on-board boot controller, DVD drive, on-board NIC, etc.).  Once drivers for new hardware were installed & onboard NIC configured, I powered down the Acer PC, removed the Adaptec SATA controller card, & connected drive 0 to the on-board primary SATA port.  Powered on the PC – the Win2k3 OS again booted successfully, and we verified that remote users were able to successfully log in and launch Great Plains.

Obviously, using a 3-yr old desktop PC as a terminal server is not a long-term solution.  But – this minimized downtime for the remote users (having them all online before noon), and provided both myself & the customer with valuable breathing room / time to resolve the root issue and get the ball rolling on replacing this server.  And given the small number of users and basic Dynamics GP use, the performance of this temporary hardware is more than sufficient for the remote users (and beats the alternative smile_regular )

And yes, there is more than one way to skin a cat – and multiple ways this problem could have been addressed.  In this particular situation, I felt this was the best approach to get to a working system in the least amount of time possible, considering the severe instability of the original hardware, the lack of an existing image backup of the TS, and the fact that I could easily break the mirror to run off a single HDD from the server.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>