Driver DevCon and FAST_MUTEXes and ERESOURCES, oh my!

Hello world. As you may know, I live in Kansas, so I’m allowed to use a title like the one this post has. Deal with it. If you don’t get the reference, you’re really better off.

Bad news – I just got word today that the Driver DevCon conference (redundant?) has been pushed off until Spring 2005. Bad news, yo! That was the only conference I was actually looking forward to this year. Sigh…. too bad, I’ll just have to wait another six months. This conference was supposed to be the first few days of November, but that is kind of an uncomfortable time anyway, as the US General Election is that Tuesday. Still, as one of my favorite bloggers would say… NO BUZZ!

So, how many of you have ever actually used a fast mutex in your kmode code? I’m sure most people know what they are if they have written any kmode code (if for no other reason than searching the Ex functions in the DDK index window always seems to land on the Fast Mutex stuff first). Well, I was going through some old vids of Microsoft driver development conferences the other day, and David Goebel (one of the almost-original NT guys) was talking about when and why to use them.

He made a point that I’d heard before but had never really taken to heart: you should use the least impactful synchronization primitive that you can get away with, for any given situation. We all know and love the KSPIN_LOCK, and I’m sure most device-type people get away with using only these locks. However, sometimes, using a spinlock is (to quote my grandfather) like killing an ant with a sledgehammer. There are other, more friendly locks that can be used, for the sake of yourself and for the rest of the system.

To review, the way a spinlock works is as follows: a memory region (the KSPIN_LOCK object you allocated and initialized) is “spun” on by the CPU using an atomic test-and-set operation until it finds that it now owns the lock. You could imagine pseudocode like this:


VOID AcquireLock(LONG *lock)
{
        while(InterlockedExchange(lock, 1));
}

VOID ReleaseLock(LONG *lock)
{
        *lock = 0;
}

(Five points to the first person to point out the weakness in this code). In addition to spinning on the processor, the kernel raises the IRQL to DISPATCH_LEVEL. In fact, on a uniprocessor kernel/hal, *all* that is done during KeAcquireSpinLock() is raising the IRQL to DISPATCH_LEVEL. If you don’t see why this is, keep thining about the meaning of IRQL. Maybe I’ll post about it another day.

Anyway, the point here is that grabbing (or trying to grab) a spinlock is not a nice thing to do to a computer. You might be ticking along just fine at PASSIVE_LEVEL (or maybe APC_LEVEL, or even IRQL 0.5 (ask me later)), and you decide you need a lock. Now, the fair thing to do would be to get a lock at low IRQL, so that when your quantum expires, you will get context-switched and someone else can do some work. If you’re at DISPATCH_LEVEL, it’ll never happen – you *own* the CPU (interrupts aside) until you drop your lock. How rude! But it’s actually even worse than that: while you’re at DISPATCH_LEVEL, you’re not just blocking context-switching – you’re actually blocking all DPCs on that chip. This is even worse, as the “standard model” (to borrow Walter Oney’s phrase) for interrupt processing involves queuing a DPC to do the real work. That’s right – you’re keeping other device drivers from doing their jobs. If you do this often enough or long enough, users will notice, and they’ll associate the degraded performance with the installation of your driver (“it was fine until I installed my WhizBang 3000!”).

So what’s a driver writer to do? To re-state David’s general principle, you should use the least amount of lock necessary to synchronize with threads at the highest IRQL you can be called at. If you are synchronizing DPCs against each other, you’re going to need spinlocks, because DPCs cannot block, they can only spin. That has the effect of requiring every locker to spin, whether or not they could have blocked. This is also the situation when synchroninzing with your ISR (via KeSynchronizeExecution()). However, if you are a software-only driver of some sort and you know you won’t ever be trying to grab a lock at DISPATCH_LEVEL, you don’t need spinlocks at all. In these cases, you have lots of other choices, but my two favorites are FAST_MUTEXes and ERESOURCES.

Fast Mutexes are manipulated via the ExInitializeFastMutex/ExAcquireFastMutex/ExFreeFastMutex functions. They are very fast and very small. Also, since they aren’t dispatcher objects, the kernel doesn’t have to grab the dispatcher lock to service you. This is a Good Thing(TM), as the dispatcher lock is right up there with the cancel lock in terms of contention. Fast Mutexes do only what you think they should, as minimally as possible, and as effeciently as possible.

ERESOURCEs are sometimes known as reader/writer locks or shared/exclusive locks. If you can use them, you should, as they will generally have the effect of reducing contention and increasing concurrency in your driver. ERESOURCEs are suitable whenever you have operations that only need a lock to keep stuff from getting changed out from under them, as opposed to needing to actually make changes themselves. The Windows filesystem drivers make extensive use of ERESOURCEs.

Someone recently asked on NTDEV about dropping the IRQL after you grab a lock. I’ll refer you there for the details (I believe it was Doron Holan who posted the answer), but to make a long story short, never EVER do this. You can basically only call KeLowerIrql() if you previously called KeRaiseIrql(), and even then you can lower the IRQL no lower than where it was when you called KeRaiseIrql().

A couple of other points about FAST_MUTEXes. First, they are *not* dispatcher objects. That means that you cannot call KeWaitFor***() on them, and in particular, you cannot wait for multiple objects with them. If that’s what you want to do, they all have to be dispatcher objects. Also, unless you are very sure you know what you’re doing, don’t try to use ExAcquireFastMutexUnsafe(). Another point to remember is that acquisition of a fast mutex mutex will raise the IRQL to APC_LEVEL, which isn’t bad most of the time, but does block all APCs out, so there are some things that the thread you’re running in won’t be able to do until you drop again. Finally, fast mutexes are not reentrant. You will deadlock if you try to acquire a fast mutex that you already own. Again, if recursive acquisition is what you need, go look at regular dispatchre MUTEXes.

All of this basically just boils down to being a good citizen and obeying the Law Of IRQL Minimization at all times. Don’t lock if you don’t need to, don’t lock for longer than you need to, and use the right tool for the job.

2 Replies to “Driver DevCon and FAST_MUTEXes and ERESOURCES, oh my!”

  1. no, ReleaseLock is ok as is.

    But AcquireLock() can be "improved":

    VOID AcquireLock(LONG *lock)

    {

    while(InterlockedExchange(lock, 1)) {

    while(*lock != 0);

    }

    }

    this way it will spin mostly in the inner while() loop that avoid "coherent bus traffic" incurred by InterlockedExchange().

    It can be improved even further by assuming that lock contention is low and making fast-path inline.

Leave a Reply

Your email address will not be published. Required fields are marked *