Better Than Chocolate: Cancel-Safe Queues

One of the earliest things I remember discovering about the difficulties of programming in kernel mode (right after “What the @!$ is with all of these UNICODE_STRINGs?!”) is how easy it is to get yourself into race conditions.

All of the standard practices about multi-threaded programming apply when writing a driver, but there’s another kicker: you actually have to *care* how many CPUs you have in your system. Well, in particular, you have to care if you have more than one. As I said yesterday, you can officially assume from now on that every stupid Pentium 4-based computer from the local CompUSA is a dual-processor box, so you have to take this seriously. Add to that the subtleties of dealing with spontaneous IRQL raises, interrupts, running in arbitrary thread contexts, and so on, and life gets interesting.

One of the most common race conditions is the IRP cancellation race. It’s also one of the trickiest to deal with, even if you generally know what you’re doing. Cancellation has changed over the years from the original design, partly due to the change in devices themselves, and partly due to OS optimization. The original mechanism the OS provided for managing the cancel race was based on using StartIo routines, and in fact, the latest DDKs still recommend using a StartIo routine for IRP queue management. It certainly works, for what it was designed to do, but it’s not optimal for a number of reasons. Software-only drivers (“virtual” drivers of various sorts, filesystems, etc.) frequently find that the StartIo model is insufficient. Besides, the cancel lock is one of the hottest locks in the system, so staying as far away from it as possible is always a good idea. Walter Oney has a good description of IRP queuing and cancellation in his WDM book, in which he details other reasons he doesn’t typically use StartIo-based queuing.

With that said, rolling your own IRP queuing logic is very difficult. The races are subtle, and unless you’ve made a lot of these mistakes before, you’re highly likely to do it wrong, no matter how much you think you have gotten it right. Trust me, I know. 🙂 Fortuantely, Microsoft has provided a reusable queuing framework called Cancel-Safe Queues. It is implemented in the kernel on XP+ and is available as a static library for all previous OSes. With the advent of the CSQ, there is no reason to ever write custom IRP-queuing logic again.

CSQ is easy to use, and has the distinct advantage of being massively re-used, so it’s likely to be bug-free. Tomorrow I’ll talk about the race conditions in more detail, and later I’ll provide an example of how to use CSQ in your driver.


Leave a Reply

Your email address will not be published. Required fields are marked *