30 Mar 2007

On Dell”s PERC 5/i

Author: q | Filed under: Frustrations

So in a previous post,
I railed about my surprise at finding out that the Dell PERC 5/i
controller has no audible alarm and why that”s a concern for me. Well,
here we are, nearly two weeks later, and after much going round and
round with Dell on the issue, I have more information, and it”s not
necessarily good.

My specific initial issue is that, besides not
having an audible alarm, Dell”s Server Management software (Open
Manage) doesn”t have a way to send notifications about problems with
the RAID controller, either the controller itself or the failure of an
element attached to the controller. After my initial support call with
Dell about the issue, they indicated that the IT Assistant software
should run on the Windows 64-bit box, and that will send notifications
when an issue is detected. I”ve since found out that no,the latest
version of IT Assistant that”s posted on the Dell web site will NOT,in
fact, run on the 64-bit Windows platform. Of course, everything about
IT Assistant tells you that it really, truly, should be run on another
box, but for this specific instance, that”s not going to be possible.

In
digging further into this, however, I”ve uncovered a couple of other
issues that concern me. Given that IT Assistant seems to be the
preferred way to actively monitor the RAID, I thought I”d try to
install it on the SBS box I have at the office. No dice. IT Assistant will
install onto a server platform, but not SBS. It”s a hard block. So for
all my SBS servers with PERC 5/i cards, I have nothing from Dell that I
can run on the server box that will monitor the helath
of the RAID controller and notify me when there”s a problem. With my
other RAID controllers, I can fall back on the audio alert at the very
least for notification of a problem, but don”t have that option here.

Last
Thursday, at my SBS user group meeting, I mentioned my frustrations
with the situation in a side conversation, and one of the folks I was
talking with was as taken aback as I was when I first figured this out.
He just put a number of servers with PERC 5/i cards in them out in
production and was also unaware that the controller had no audible
alarm mechanism. I”ve mentioned this to a couple of other folks as
well, with pretty much the same reaction.

This week, I started
putting together specs for a couple of new servers for a couple of
projects and, knowing the challenges of the PERC 5/i, decided to look
at other controller options for these boxes. Unfortunately, I”ve found
that, currently, the only RAID controller that Dell provides that
supports RAID 5 is, surprise surprise, the PERC 5/i. There are other
controllers, but those only support RAID 0 or 1. And I don”t yet have
confirmation if those controllers have audible alarms on them or not.
So, even if I was able to “settle for” a RAID 1 solution (and to be
fair, on one box it”s not unrealistic), I still think I”d be in the
same situation.

I”ve been working with a couple of folks at Dell
on trying to find a reasonable resolution to this problem. Of course,
there”s always the recommendation that I can run IT Assistant on a
separate workstation to monitor the array card in the server and send
notifications back to me  if/when there”s a problem. But that”s not
necessarily a realistic solution at some sites. Now I have to install a
piece of software on a workstation that has to be running all the time
and may or may not interfere with what the user of the workstation is
trying to do. I simply can”t afford to stick a dedicated box at each of
my client sites to do this monitoring, nor can I ask them to dedicate a
workstation to do this themselves. It looks like I”m going to have to
go third party for a solution, and while that”s probably less costly
than doing a dedicated workstation to monitor the array, it”s still an
added expense that I really don”t think I should have to incur in order
to be proactive with my clients.

I honestly believe the folks
I”ve spoken with at Dell understand my plight. While they have not
committed to anything, there have been discussions about changes to
engineering on future controllers to ensure an audible alarm among
other possibilities. Based on a series of messages that floated around
this afternoon, I know the issue has been escalated internally, but
still have no clear direction on where to go.

At the end of the
day, two weeks after I first placed the call regarding the failed array
and lack of notification of the failure, I still have a box that I will
have to manually monitor for RAID health. I”m hoping for a better
solution, and I expect that I”ll just have to be patient.

I sure hope that data cable doesn”t pop off the drive connector again, tho…

24 Responses to “On Dell”s PERC 5/i”

  1. Eli Says:

    Thank you!
    I”m intending to buy such controler to my Dell 690P workstation. and i will check this problem.
    Regards
    zilkael@013.net

  2. Electrosonics Says:

    Right on with your comments on Perc 5/i! As of 7/6/07 I too have come to a dead end. No answers from dell on monitoring the health of the RAID array. In my prototype system (4 drive array) from Dell, I was able to see events in the event log when the RAID array had a drive failure. With health monitor, I was able to create a rule to email me when that event occured. But in the production system (3 drive array with hot spare, wiped, new array, reload), I have yet to be able to reproduce my initial success. When I have a drive failure, the hot swap spare automatically kicks in but no event log message is reported hence no notification of a drive failure. This is pure BS. Notification upon a drive failure is so easy to impliment at the driver level. Someone dropped the ball here. I find it frustrating that Dell product development cannot do better.

  3. looplocal Says:

    Along with the lack of audible alarm, if you purchased a Dell Perc5/i RAID controller card (not necessary a cheap investment) with a Precision workstation, be aware that it will come WITHOUT a card battery backup unit (BBU).

    This was surprising as you would think it would have been a fairly trivial cost addition relative to the price of the $599 card. I spent hours trying to order it as a spare part from Dell, but with no luck.

  4. Stephen B. Says:

    I”ve written a cron job that I”ve set to execute every minute. With each run, it blasts every open tty with a message from Dell”s OMSA software if it finds a drive that has failed or is about to fail, and it sends mail to the root account (or its alias) if a drive”s rebuilding. It”s meant to be insanely annoying to demonstrate just how important it is that it get fixed. I run a small lab, so it”s important to me that anyone using the system would be able to know that something”s up. The rebuilding part may be omitted, but I like to check up on the status of any rebuilding drive. I”m sure there are tons of tweaks to it that a more experienced programmer could make (like sending messages to SMS servers, sending non-local e-mails), but I”ve tried to keep it simple for now. It”s posted it below, in hopes that it can help people out:
    ~~~

    #! /bin/sh
    ###################################################
    # Checks RAID array to ensure that drives have not
    # gone critical and are not predicted to fail. If
    # something is amiss, hammer out a notice to
    # ensure that the faulty drive is replaced before
    # a second failure. While it”s repairing, mail a
    # notice to root informing root of rebuilding
    # progress.
    #
    # Depends on Dell”s OMSA being installed; tested
    # on PERC 5/i HW RAID card on Dell PowerEdge 2900
    # running Debian etch.
    #
    # Meant to be run in root”s crontab.
    ###################################################

    # Assigns script variables
    # Edit these values to correspond to your hardware
    # and software configuration

    CONTROLLER=0
    MAIL_ROOT=”root”

    ###################################################
    #DO NOT EDIT BELOW THIS LINE!
    ###################################################

    # Locates omreport on this system
    OMREPORT=which omreport
    GREP=which grep
    WALL=which wall
    MAIL=which mail

    # Checks status of RAID array
    # First, checks for critical failure and sends messages to all ttys if failure has occurred
    RAID_CRITICAL=$OMREPORT storage pdisk controller=$CONTROLLER | $GREP Critical
    if [ -n “$RAID_CRITICAL” ]; then
    echo $OMREPORT storage pdisk controller=$CONTROLLER | $GREP -B 3 -A 2 Critical | $WALL
    exit 1
    fi

    # Next, checks for predicted failure and sends messages to all ttys if failure is predicted
    RAID_FAILURE=$OMREPORT storage pdisk controller=$CONTROLLER | $GREP “Failure Predicted : Yes”
    if [ -n “$RAID_FAILURE” ]; then
    echo $OMREPORT storage pdisk controller=$CONTROLLER | $GREP -B 4 “Failure Predicted : Yes” | $WALL
    exit 2
    fi

    # Finally, checks for rebuilding of array and sends messages to root if rebuilding
    RAID_REBUILDING=$OMREPORT storage pdisk controller=$CONTROLLER | $GREP Rebuilding
    if [ -n “$RAID_REBUILDING” ]; then
    echo $OMREPORT storage pdisk controller=$CONTROLLER | $GREP -B 3 -A 2 Rebuilding | $MAIL -s “RAID array rebuilding” “$MAIL_ROOT”
    exit 3
    fi

  5. Doug Wassmer Says:

    I just purchased a PERC 5/i daughter card for my Poweredge 1900 running Windows 2003 Server. I ran across your coments when I was searching for instructions on how to install the daughter card. I think that it fits into slot 4 on the motherboard, but I have no idea where to place the battery that came with it. The battery does not attach to the daughtercard. I couldn”t find good instructions at the dell iste or on the documentation that came with the 1900.

  6. James Says:

    You can use the LSI software, as the Perc 5/i is really an Dell OEM version of the LSI MegaRAID SAS 8408E, with a few minor differences.

    MegaRAID Storage Manager can email alerts to a set email address if something goes wrong.

    http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8408e/index.html

    And, Looplocal, you can pick a BBU off eBay for sometimes as little as $15, so all is not lost! 🙂 – I bought a whole kit, battery, cable and holder for $30.

  7. infinity005 Says:

    I stumbled on this while looking for answers to performance problems with the perc 5/i. Before going live, I”ve run iozone on it and write performance with 7 drives in raid 5 is atrocious! I haven”t yet figured out why, and the card will have to get returned if I don”t because an array with 4 disks is 3 time faster doing writes than a 7 disk array.

  8. B.Grujevsky Says:

    Yes, the LSI software works fine. Peculiar that DELL didn”t take that “part” of the LSI-controller into their software.

  9. Wes Says:

    James has got the best info.. Go to LSI webpage and download the MegaRAID Storage Manager, it is the exact same software, but has email notifications enabled.

    For some reason Dell stripped out the SNMP section of the LSI software, which is the part that does the notifications. Great job Dell!

  10. LoopLocal Says:

    Thanks for that info. I managed to find the kit as suggested. Once I had the part number Dell was able to provide via the Spare Parts department.

    I installed the BBU, but how does one know it is recognized? My Perc 5 BIOS enabled configuration does not give any info other than the warning after checking write back even without bbu option. Any idea on how to know if a Perc 5 bbu is installed and identified by the bios correctly? Will the Perc 5 BIOS config always give the Write Back/BBU warning even if a BBU is installed?

    Thanks for any info.

    loop

  11. LoopLocal Says:

    Please forgive my BBU question. I installed the MegaRaid as suggested by James and that provides an abundance of info (unlike the BIOS config) including the BBU identification.

    BBU Present? YES !

  12. LoopLocal Says:

    At the LSI site, there seems to be a more aggressive driver and firmware availability compared to Dell.

    Any thoughts on if it is advisable to go with LSI”s driver and firmware releases instead of Dell”s outdated ones? There seems to be quite a few bug fixes and many Vista enhancements/fixes.

  13. Adam Cybulski Says:

    I see that you posted this about a year ago, I was wondering if any progress has been made on this issue. I am running into the exact same problem, we deployed numerous Dell servers to our clients only to find that they have no alert system. while some of the clients have workstations, others do not.

  14. Andy Says:

    Loop – can you post the part number for the BBU? – is it the G3399?

  15. Jerodh Says:

    Great blog article. Very helpful. I”m in the process of putting together a PowerEdge 1900 with Perc 5/i running CentOS 5.1 64-bit for vmware server 1.0.5. Hope to run a Windows and Novell virtual machines on it.

    I”m going to try out an altered script that Stephen B. wrote. If I can”t get that to work then I”ll use the LSI MegaRAID Storage manager. My concern with running the LSI product is that I”m using Dell”s megaraid_sas driver version 3.16 and LSI”s latest driver is 3.13.

  16. jerod h Says:

    I couldn”t get the script above to work (my bash didn”t like the pipe command in the variables), so I created this one using perl.

    #!/usr/bin/perl

    # A simple perl program to send alerts if problem found with physical disks
    # by using OMSA 5.2 for Dell PowerEdge
    # I do not provide any guarantee that this script will work. Use at own risk.

    #Written: May 2008
    #By: Jerod H

    $controller=”0″;
    $emailaddress=”jerod@yourdomain.com”;
    $omreport=”/usr/bin/omreport”;
    $mail=”/bin/mail”;
    $servername=”yourservername”;

    # run omreport command and put into olist

    open(LS, “$omreport storage pdisk controller=$controller |”);
    while() {
    chomp;
    push @olist, $_;
    }
    close(LS);

    # Go through each line in olist to look for “Critical”, “Rebuilding”, or
    # “Failure Predicted : YES”

    $email=0;
    $subject=””;

    foreach $line(@olist){
    if ($line =~ /Critical$/i) {
    $email=1;
    $subject=”$servername Hard Drive Critical”;
    }
    if ($line =~ /Failure$/i) {
    $email=1;
    $subject=”$servername Hard Drive Rebuilding”;
    }
    if ($line =~ /^FailuresPredicted(s)+:sYes$/i) {
    $email=1;
    $subject=”$servername Hard Drive Predicted to Fail”;
    }
    }

    #If something was found email will = 1 so send email
    if ($email==1) {
    system(“$omreport storage pdisk controller=$controller | $mail -s ”$subject” $emailaddress”);

    }

  17. Tyler Says:

    I think Dell should place the alarm on the controller. Those admins who do not want the alarm can disable it like you could before yanking it from the card.

  18. Mårten Says:

    Hi!
    Im going for a 2900, is the Perc 6/i any better?

  19. eriq Says:

    PERC 6 series has the same problems. No audible alarm. I”ve continued to try to raise the issue with Dell but with little response, other than the technicians agreeing that the audible alarm should at least be an option.

Leave a Reply