Monthly Archive

Categories

Machine UpTime

The February copy of TechNet magazine dropped through the letter box this morning. In the UK we get our own version so some of this may be a bit older than this months edition.  There is a nice article by Marco Shaw on using PowerShell with System Center Operations Manager that is well worth reading.

The article that really got me thinking was the one about calculating server uptime using information from the event logs. The script is actually measuring the availability of the event log service but it is very close to the available time.

One thing that really leapt out was that the main script was using PowerShell v2 – it had a #Requires –version 2.0

statement at the top.  As v2 is still in CTP that didn’t seem right.  The whole script looked over complicated so I started playing around and came up with this:

$days = 30
$now = Get-Date
$start = (Get-Date -Hour 00 -Minute 00 -Second 00).AddDays(-$days)
"Checking Last Boot Time"
$os = Get-WmiObject -Class Win32_OperatingSystem
$lastboot = $os.ConvertToDateTime($os.LastBootUpTime)
if ($lastboot -lt $start){ Write-Host "Server continually up for whole period"; Return}
else {Write-Host "Server restarted since start of period - analysis continuing"}

"Reading Event Logs"
$events = Get-EventLog -LogName system | where{(($_.EventId -eq 6005) -or ($_.EventId -eq 6006)) -and $_.TimeGenerated -ge $start } | Select EventId, TimeGenerated, Index

## should start with a 6005 - log service started event
if ($events[0].EventId -eq 6005){
    $totaluptime = $now - $events[0].Timegenerated
}
else {
    Write-Host "Error reading log - startup is not first entry"
    Return
}

#check the last
$last = $events | select -Last 1
if ($last.EventId -eq 6006){      ## shutdown
    $totaluptime += ($last.TimeGenerated - $start)
}

## events should be paired 6006\6005 shutdown & start respectively
for ($i = 1; $i -le $events.count-2; $i += 2){
    if ($events[$i].EventId -eq 6006){      ## shutdown
        if ($events[$i+1].EventId -eq 6005){      ## Startup
            $totaluptime += ($events[$i].Timegenerated - $events[$i+1].Timegenerated)
        }
        else {
            Write-Host "Error in log sequence at " $event[$i+1]
            Return    
        }
    }
    else {
        Write-Host "Error in log sequence at " $event[$i]
        Return    
    }
}
## calculate uptime
$totaltime = $now - $start
$percUptime = (($totaltime.TotalHours - $totaluptime.TotalHours)/$totaltime.TotalHours)*100

"Uptime for period $($start.ToLongDateString()) to $($now.ToLongDateString())"
"Total time available: {0:n2} hours" -f $($totaltime.TotalHours)
"Total Uptime: {0:n2} hours" -f $($totaluptime.TotalHours)
"Percentage Uptime: {0:n2} %" -f $percUptime
"Percentage Downtime: {0:n2} %" -f (100 - $percUptime)

 

We start by defining some variables – the number of days we want to analyse, current date and our starting point.

The first check is on when the server was actually started – if it was before the beginning of our period then we have 100% up time and the bonus is in the bank.  We can check this using WMI.  The only awkward bit is converting the boot time to a format we can work with.

Assuming that our server was started since the start of out analysis period then we need to look at the logs.  We can read the system event log looking for eventids 6005 and 6006 as shown. We only want events since the start of our period.

The event logs are writing in chronological order and are returned in the same order with the youngest returned first.  I have yet to see an instance of pulling information from the logs when this wasn’t the case.

The first (youngest) event should be a 6005 – event log service started.  We create a timespan by subtracting that time from the current time which gives us up time since the last restart. If this isn’t the case then we have a problem that needs to be investigated so the script stops.

A check is made to see if the last event is a shutdown – in which case we need to calculate the uptime from the start of the period to shutdown and add it to the total.

The 6005\6006 events should be paired after this with a 6006 (shutdown) followed by a 6005 (startup) – remember we are working backwards in time.  Assuming we find our pairs of events as expected we calculate the timespan between the events and add it to our total uptime.  If the pairings don’t match up then we have an issue to be investigated so the event information is written to screen to give a starting point for analysis.

We then calculate the total timespan of our period and the percentage uptime.  Finally we print out our results.

I think this is easier to follow and seems to work correctly in my testing environment. Remember that this runs on the local machine as written.  It can be made to work on remote machines – Get-WmiObject accepts a computer name parameter as does get-eventlog in PowerShell v2.  If you are using v1 then you could access the remote logs using WMI.

 

Leave a Reply