Some IOCTL Code Definition Tips

We’ve all* defined our own IOCTL codes in the past. They’re a primary way to enable user-mode -> driver and driver -> driver communication. Although they were probably originally envsioned in the context of storage-related drivers, Microsoft supports the use of an IOCTL dispatch in most kinds of drivers. NDIS, for example, supports NdisMRegisterDevice for the explicit purpose of providing access to the IOCTL framework in network drivers.

IOCTL codes are defined using the CTL_CODE() macro, which is part of ntddk.h in kernel-mode and winioctl.h in user-mode. The first parameter takes a Device Type, which can either be one of the Microsoft-defined device types (see the DDK headers for a list) or a custom device type code.

There are a couple of things to keep in mind here. The first is that the Device Type parameter must match the device type that is passed into IoCreateDevice(). Also, If you define a custom device type, it should be above 0x8000. The bottom 15 bits are what actually represent the device type, and the 16th bit is known as the Common bit. The DDK requires that the Common bit be set on all custom Device Types. Another way of saying this is by requiring all custom codes to be between 0x8000 and 0xFFFF. Similarly, the function code is required to be between 0x800 and 0xFFF, because the top bit (“Custom” in this case) is required to be set for non-Microsoft-defined function codes. Playing by the rules will make your driver as compatible as possible with all releases of the OS, present and future.

Method is one of METHOD_BUFFERED, METHOD_IN_DIRECT, METHOD_OUT_DIRECT, or METHOD_NEITHER. METHOD_BUFFERED is the most common transfer method, and is generally the safest and easiest to use. This method double-buffers your data by copying it from the supplied user-mode buffer into a newly-created kernel-mode buffer, and then passing that new buffer to your driver instead of the original one. If you’re transferring less than one page of data (4K on x86), and especially if you’r doing it infrequently, this is the way to go. If you’re tranferring larger amounts of data, one of the DIRECT methods may make sense. This is particularly true if you’re going to wind up DMAing your data to or from a device, but it is also true if you just want to avoid the double-buffer in general. I’m not going to discuss METHOD_NEITHER at the moment, other than to say that you shouldn’t use it. I’ll get in to more detail about why another day.

The final knob to turn is the RequiredAccess parameter. I admit that I really didn’t understand what this parameter was for until quite a while after I wrote my first driver. It turns out that it is a method for enforcing some small but nontrivial amount of access control on who can call your IOCTL. This specifies the kind of access the user must have to the device, as specified in the CreateFile() call, in order for the IO manager to let the IRP through. FILE_ANY_ACCESS means that they can send the IRP with virtually any access at all, as long as they have an open file handle to the driver. FILE_READ_ACCESS and FILE_WRITE_ACCESS loosely correlate to the ability to read and write data to and from the device.

Most driver writers just set this to FILE_ANY_ACCESS and forget about it. This is, of course, exactly the wrong thing to do. A much better strategy is to specify the most restrictive access possible (FILE_READ_ACCESS|FILE_WRITE_ACCESS — yes, you can OR them together) whenever possible, and only remove bits when necessary (“necessary” depends on the kind of driver you’re writing). This parameter is particularly important in IOCTLs where you’re actually reading and writing data — why would you allow a user to read data from an IOCTL if you wouldn’t allow the same user to read data using ReadFile()? — but it should probably be applied carefully to all IOCTLs.

Finally, it might be obvious, but try to name your IOCTL codes something obvious. My office has a standard that goes IOCTL__. In other words, you might have IOCTL_POSVPN_SET_INFO to configure our VPN driver. This goes with the standard rants about variable naming, and is generally an important thing if you want someone else to be able to work on your code.

OK, I expect everyone to run out and tighten up their use of CTL_CODE(). When you’re done with that, go listen to Fred Jones, Part 2, by Ben Folds. It’ll make you a Better Person.

Happy hacking!

* OK, “All” might be a bit of an exaggeration. 🙂

Inside the NT Insider

I just got the latest edition of the NT Insider from OSR in the mail today. They really out-did themselves with this issue. It’s 52 pages of my favorite topic: testing! There are lots of articles about debugging, testing, and so on. They’re well-written, as usual, and form a teriffic resource for budding driver developers and seasoned pros alike.

If you haven’t done so already, go over to www.osronline.com and register for a subscription. You won’t be sorry.

Two Quickies

If you’ve spent much time looking at Microsoft sample code or reading through the DDK headers, you’ve probably noticed a macro called PAGED_CODE(). The DDK defines it as:

#if DBG
#define PAGED_CODE() \
    { if (KeGetCurrentIrql() > APC_LEVEL) { \
         KdPrint(( "EX: Pageable code called at IRQL %d\n", KeGetCurrentIrql() )); \
         ASSERT(FALSE); \
    } \
#else
#define PAGED_CODE() NOP_FUNCTION;
#endif

As you can see, this function makes sure your code is being called at <= APC_LEVEL, which is a requirement for things like referencing pageable memory. It's a good idea to put this at the top of all of your functions that require <= APC_LEVEL. Because it compiles out in free builds, there's no real harm in using it. You'll be a better person for it.

Another nice thing to do is to mark your segments as pageable or as init if possible. Most of the Microsoft samples do this. it’s accomplished with a couple of pragma directives:

#pragma alloc_text (INIT, DriverEntry)
#pragma alloc_text (PAGE, AddDevice)

The idea here is to use the first pragma on any functions (like DriverEntry) that will never be called again after they run the first time. The OS can then discard them completely and not waste any more resources on them. The secnd pragma marks functions as pageable. meaning that the code can be paged out to disk if needed, to make room for other processes to execute. Note that this cannot be done for any code that might run at DISPATCH_LEVEL or higher, and as such is nicely mated with the PAGED_CODE() macro.

The Joy of NUMA

Dr. HardwareBlog has an article up about Non-Uniform Memory Architectures on his blog. It looks like the first in a series. This has interesting ramifications for all developers, including kernel-mode development. Keep an eye on it.

Driver Developer’s Toolbox, Part 2: WinDBG

It’s another back-to-basics day. Lots of driver developers post questions on various forums that basically boil down to “how do I debug my driver?”. Let me see if I can clear things up a bit.

If you want to be a serious driver developer, you will need a debugger. I know you don’t write bugs 🙂 but you might not understand something about the way the OS is behaving underneath you, so there is still a need to get good with a debugger. Furthermore, you’ll never understand how a computer works in general, or how Windows works in particular, until you’ve done some time staring at a debugging console. You need a debugger if you plan on using the checked build, running driver verifier, disabling system file protection, or looking at the output from your DbgPrint/KdPrint statements. And, of course, you need a debugger because there’s no other way: there is no printf() debugging or MessageBox() debugging in the kernel.

Taxonomy of Debuggers

There are really only two debuggers that are in common use in the driver development community. One is a product called SoftICE, from Compuware (formerly NuMega). It’s a good product, and above all, it’s a phenomenal hack, considering the way they’re able to slip the debugger between Windows and the bare iron. Because of this approach, SoftICE works on the actual computer you’re trying to debug (as opposed to WinDBG). One other benefit of SoftICE is that it runs on Windows 9x. If you’re a Poor Unfortunate Soul stuck with supporting one of those fine versions of Windows, SoftICE may be a good idea: the other debugger (wdeb386.exe) is *absolutely horrible*. Really. Unusable. Promise.

SoftICE does have a couple of drawbacks, though. First, it’s expensive. It only comes bundled with other products that you may not want or need. In fact, for the price of SoftICE, you can probably buy either VMWare (below) or a second computer to use as a debugging target. The bigger issue to me, however, is that it’s not the standard. Most kernel-mode programmers work with WinDBG. Questions about your driver that have debugger involvement will almost always be answered in the context of WinDBG. If you ever call Microsoft Product Support Services for help, WinDBG will be the assumption. That said, SoftICE really is a great product. if you want to go down that path, knock yourself out.

In general, though, it’s probably best to go to WHDC and download the Debugging Tools for Windows (also known as WinDBG). WinDBG (pronounced either “win-d-b-g” or “wind-bag”, depending on who you ask) is an incredibly powerful, feature-rich debugger for kernel-mode (and user-mode!) applications. In addition to the standard debugger intrinsics, Microsoft has shipped dozens of deubgger extensions that do higher-level things like look for locks, dump the contents of IRPs in a way that makes sense, and so on.

Setting Up Your Debugger

Getting kernel debugging set up is a little bit of a pain. There are two basic ways to go about it. The first way is to use two computers, conneted to each other by either a standard null-modem cable or by a FireWire cable. I’ve heard nothing but complaints about FireWire from Gary Little, a frequent contributor to NTDEV and the public Microsoft newsgroups, and I have never used it myself, so I generally recommend the good old fashioned rs-232 connection. One thing worth noting is that it has to be a plain-jane RS-232 port on the target machine; USB serial ports and the like won’t work, as they depend on the kernel loading lots of drivers to get them to work.

In addition to a physical connection, you have to modify the debug target computer’s boot.ini file. Generally, I remove the “/FASTBOOT” switch and add “/SOS /DEBUG /DEBUGPORT:COM1 /BAUDRATE:115200”. Speed is everything, so you want to use the highest baud rate that your box supports. Once you make the modifications, your computer will boot to a boot menu and prompt you as to whether or not you want to launch the debugging mode. Choose the “[Debugger Enabled]” option to make the kernel look for a kernel debugger.

If you don’t have a second computer handy, there is a new feature introduced with Windows XP that may be of some use. It adds the ability to do a limited form of local kernel debugging. This is similar to the livekd tool that was shipped with Inside Windows 2000, and is useful as a learning tool. It cannot be used to do any sort of invasive kernel debugging, though, so it’s mostly inappropriate for kernel development.

Another way to avoid getting a second debugging target computer is to use VMWare or Microsoft VirtualPC. I use VMWare every day to develop and test drivers. It is amazingly good for this sort of development. You can do remote kernel debugging over a named pipe from host to guest, and you can set up restore points, so you never have to worry about crashing and burning on a test computer. No more 10-minute re-ghosting procedures – restoring a crashed vm takes 10 seconds on my test computer. Note, however, that this is not totally sufficient for kernel development, as VMWare doesn’t emulate a multi-processor VM yet, and it can’t do 64-bit CPUs. All things considered, I’d never be caught trying to do driver development without VMWare. I’ve never used VirtualPC, but I hear it’s similar. VPC has the advantage of being included in MSDN subscriptions.

Working With The Debugger
In your debugging host computer, start WinDBG, choose Kernel Debug from the menu, and enter in the appropriate communication parameters. Once you hit OK, the deubgger will check the target computer, and you’re ready to go. Hit “Ctrl+Break” to break into the target. What you do from here is best learned by reading the debugger’s help file; typte .hh at the kd> prompt for more.

If your computer crashes while a debugger is attached, it won’t bluescreen. Instead, it will break into the debugger and give you a chance to figure out what is going on. One of the most useful commands to type at the kd> prompt on a crashed computer is “!analyze -v”. It will invoke an analysis extension in the debugger that is extremely good at figuring out what is wrong with the crashed computer. If you ever post a question on a public forum about a crashing driver, please be sure to include output from this command.

In order to get !analyze -v to work correctly, you must be running the correct symbols. Fortunately, Microsoft has fixed the Hell of Symbols in recent years. In current debuggers, you can use a symbol path that points to an Internet server from which the correct symbols are automatically downloaded. Not 100% of the OS symbols are found on the symbol server yet, so I also use an old-fashioned symbol directory for things such as service pack symbols, checked build symbols, etc. My symbol path winds up looking like:

srv*x:\symbols\symserv*http://msdl.microsoft.com/download/symbols;x:\symbols\2ksp4chk;...

Once your symbols are set up correctly, issuing a .reload command from the kd> prompt will load symbols to match the running binaries. The stack (‘kb’ at the kd> prompt) should now look more reasonable, depending on the kind of crash you have.

Wrap-Up
I only have one complaint about WinDBG: the Hell of Docking. Microsoft recently re-worked the user interface (I think this happened with 6.3), and I can’t get the damned thing to lay out the way I want it to any more. If anyone from Microsoft is reading, *please*, free us from the Hell of Docking! Let my people go!

I hope you’ve found this little tutorial useful. Debugging is an art form that takes a lifetime to master, and a nontrivial amount of learning just to become basically functional. The help file is good, and there are other resources on the Internet (particularly on microsoft.com). Also, don’t hesitate to post any questions to the WINDBG mailing list hosted by www.osronline.com, or to one of the public Microsoft newsgroups dealing with debugging. The OSR list is monitored by several people who know exactly what they’re doing, and the Microsoft groups get a lot of Microsoft employee participation from people that have abvoe-average amounts of clue.

Happy debugging!

Cool Hack Of The Week

Here’s an e-mail from Jamey Kirby, posted to NTDEV, on how to keep Explorer from showning a drive. I love it!

Use numbers for the drive letters. You can access the drive via
3:\yadayadayada\yadayadayada. You can even park a CMD prompt on it, but
explorer and associated APIS will not enumerate them.

Jamey

Jamey gets a Gold Star(TM) for pointing this out.

(Stack) Size Matters

Somehow, I managed to catch a cold this weekend, and can already feel the NyQuil starting to kick in. If this post seems, well, a little druk — that’s why. The good news is that I’m having fun writing it!

Today I think I’ll talk a bit about the kernel-mode stack. There are a few intersting issues in play here for kernel-mode developers.

The kernel stack is small. It is usually 3 pages or so, which means 12k on X86. Believe it or not, this is bigger than it used to be – it was 8K in NT4 and previous. My guess is that Microsoft increased the kernel stack size on Windows 2000 because of the deeper layering of drivers brought about by WDM (more on that in a sec). If you use more than 12K of stack, you’ll hit a guard page, which probably means an instant double-fault bluescreen. The reason that this is a double-fault is that the CPU tries to build an exception record on the stack for transfer to the exception handler, and when it tries to push the record on to the overflowed stack, it faults again. The only thing the kernel can do at that point is die a painful death. It’s worth noting that almost every time I’ve had a duoble-fault bluescreen, it’s been a stack fault.

Why is the stack so small? Well, this was an early design decision made by the kernel team. First off, kernel stacks are generally not pageable. That means that you’re gobbling up 12K of physical memory for each thread running in the system. You have to keep that stack sitting around forever, even if the thread is in user mode, until it exits. With the hundreds of threads in systems today, that memory starts to add up fast. Additionally, because kernel code can execute at raised IRQL, kernel stacks cannot take page faults on access. That means that the traditional way of automatically growing the stack cannot be implemented in the kernel. Because the OS has to just pick an amount of memory for the stack at runtime, and because stack memory is a scarce resource, 12k was the compromise the kernel team landed on.

This means a few things: first, you need to be conservative with local variables. No more 64K arrays on the stack, for one thing. Be aware of the fact that you probably exist within a driver stack, and the drivers above and below you would be greatful for some stack space that they could use for themselves, thank you very much. In addition, you shouldn’t ever use recursion in your driver, unless You Know What You Are Doing. Most recursive algorithms can be expressed in iterative implementations without sacrificing too much. I know your search only goes log(n) levels deep, but don’t make me stress about whether or not it’s ever going to exhaust the available stack. Finally, avoid architectures with lots of deeply-nested functions. This isn’t an excuse to practice bad design, but it’s an encouragement to keep things relatively flatter than perhaps you otherwise would have.

Going back to the layering thing for a second, this is an area with which filter drivers often have a lot of trouble. There have been versions of popular antivirus scanners that are implemented as filter drivers, that simply cannot be installed on systems with any other filesystem filter drivers. They just use up too much stack, so stack faults are common. Don’t be a Bad Filter – be conservative of stack space, and remember that users will associate any bluescreens with your driver if it’s the last driver they instaled.

One final note: thre are a couple of system-supplied functions that allow you to use the stack more carefully. IoGetStackLimits() will let you check on the lower and upper bounds of the stack; IoGetInitialStack() will give you back the base address of the thread’s stack; and IoGetRemainingStackSize() can be called to find out how many bytes of stack are left. These functions should be used whenever a design contomplates recursion, whenever you are passed an address on the kernel stack, or in general, whenever you’re trying to hunt down a stack overflow bug.

URL Me, Baby!

Good news: I have finally managed to procure a real URL for this blog: www.kernelmustard.com, active as of this morning. I’m very excited – I was getting tired of spelling out “msmvps.com” for people. You’d be surprised how easy it is to screw that up. Anyway, tell your friends, tell your colleagues, tell your co-workers – kernelmustard.com is ready for prime time!

Raymond Chen had a really interesting article on The Old New Thing today about alignment on 64-bit platforms. Worth a read if you’re not used to that sort of thing. I actually posted a little preview of a marshalling discussion as well, which is relevant to kernel-mode people.

Continuing vaguely along the theme from yesterday, I thought I’d plug a book that I refer to often when I have to work on Actual Hardware:

Developing Windows NT Device Drivers: A Programmer's Handbook
by Edward N. Dekker, Joseph M. Newcomer 
Addison-Wesley Publishing Co.; ISBN: 0201695901 

This is an oldie but a goodie. It’s written for Windows NT 4.0, but it’s still quite relevant in a couple of ways. In particular, it has some of the best hardware interfacing material I’ve ever encountered. Some of it is old now (i.e. the Hal calls, the DMA calls, etc.), but the concepts are there, and that’s what’s important. Also, Ed Dekker is a walking repository of Bad Hardware Stories, some of which come through in the book.

It’s not current at all when it comes to things like PnP and power management, but the rest of the concepts are still just about right-on. When you read it in the right light, it’s a valuable resource. It’s hardbound and printed on nice paper, with a great index – all features that make it actually usable for a practicing programmer.

Anyway, next time you feel the urge to expand your brain, put on some ben folds music, grab Developing Windows NT Device Drivers, and reminisce about the time you got your new Pentium Pro server with 48MB RAM and installed NT4 for the first time. 🙂

Driver Developer’s Toolbox, Part 1: The DDK

There are a lot of basic questions asked in the various driver development
forums that basically reduce to “how should I go about developing a driver?”.
Every so often, I’ll post a quick discussion of one of the tools I use every
day to do driver development. Try though they might, Microsoft doesn’t exactly
hammer you over the head with information like this – a lot of this know-how
really just comes from doing the job.

At the risk of starting too basic, when you sit down to build a driver, the
first tool will need is the Windows Driver Development Kit (DDK). The DDK has
changed a lot over the years, but the last couple of releases have finally
started to settle down from a consistency standpoint. The current released DDK
is the Windows Server 2003 DDK, which represents build 3790 of the OS. It also
includes full development environments for Windows XP and Windows 2000. The
DDK is (almost) free from Microsoft – go to WHDC for more information on how to
get it.

Installation of the DDK takes forever. I’ll never understand why these things
take sooooo long to uncompress and copy files, but perhaps I’m just a fundamentally
impatient person. Yep, that’s it. 🙂 At any rate, do yourself a favor and install
absolutely everything in the DDK. The whole mess – all of the build environments,
all of the tools, all of the samples, all of the docs. There are a couple of reasons
for this: first, you need to have the older build environments if you’re planning on
releaseing drivers for the older platforms. The “wnet” build environment isn’t always
backward-compatible with the “w2k” environment, so you need both. Unless, of course,
you are one of the lucky few that don’t have to support older OSes. I’m jealous.

The other reason is that the sample code is by far the best documentation in the whole
kit. You might find yourself wondering about the semantics of a particular API in a
particular situation, and it really helps to be able to grep through all of the samples for
usage examples. I recently was in a situation where I needed to get a file handle back
from a file object, so I looked at all of the samples I could find for examples of
ObOpenObjectByPointer(). There were *none* in any of the files, prompting me to
redesign the driver to not need that API. I’m better off now, in that the overall architecture
wound up being much cleaner.

The DDK includes the latest authoritative documentation from Microsoft on how
to build drivers. It includes both “design guides” and reference material.
The design guides tend to read like technical documentation, so I’d still
recommend one of a few third-party driver development books. The reference
sections provide lots of detail about every public function provided for use by
driver writers. Most of this documentation (all?) is also available online at
OSR’s website www.osronline.com, which
also has tons of other resources for driver writers. For those who didn’t
know, OSR is the company that hosts the NTDEV and NTFSD mailing lists.
Wherever you get it from, I’d recommend becoming very familiar with the DDK
documentation.

Also included with the DDK is an entire build environment. All of the headers,
libraries, compiler tools, and support infrastructure needed to build your
driver are included. In particular, you don’t need visual studio to compile
your drivers. In fact, using visual studio directly is not supported by
Microsoft for driver development. Mark Roddy (www.hollistech.com) has a script called
DDKBUILD that can be used to integrate the DDK with Visual Studio, but unless
you’re really glued to its editor (and believe me, there are better ones), there’s really no reason to use it.

Finally, don’t miss the array of testing and troubleshooting tools present in
the DDK. There are so many of them, and some of them are so important to doing
the job correctly, that I’ll post specifically about them another time.

Innies And Outies

Lots of people lately have been trying to do kernel mode file I/O, and running
into their share of problems in the process. I was just involved in a virtual
disk project that relied heavily on file I/O in the driver, so I thought I’d
post a quick tutorial while it’s fresh in my mind.

First things first, try not to do this. Someone from Microsoft made the point
on NTDEV that you really don’t want to have to do file I/O from kernel mode if
you can avoid it. There are security issues to consider (e.g. opening files
that the user wouldn’t have had access to), and besides that, it’s just
tricky.

With that said, if kmode file I/O is what you need to do, there are two basic
ways to do it: IRP-based and function-based. The easier of the two methods is
the function-based method, which employs the use of the Zw APIs for file manipulation.
Typically, files are opened with ZwCreateFile, read written with ZwReadFile and
ZwWriteFile, perhaps queried with ZwQueryInformationFile, and closed with ZwClose. This
method of file manipulation is geared toward using handles, so the standard warnings
about kernel-mode handle use apply. If you’re running on a newer OS, specifying
OBJ_KERNEL_HANDLE in your OBJECT_ATTRIBUTES is always a good idea, as it makes the handle visible in all
contexts, while at the same time making it useless from user mode. Other than some
basic API differences (i.e. OBJECT_ATTRIBUTES structures, UNICODE_STRING strings, etc.),
this should feel quite a lot like Win32 access.

The one big caveat with function-based file I/O is that it cannot be done from any
IRQL > PASSIVE_LEVEL. This, in particular, includes APC_LEVEL. If you happen to be
sitting below a filesystem driver, for example, you may find yourself called back at
APC_LEVEL, and it is incorrect to use any of the Zw* file manipulation functions at that
IRQL. The reason has to do with I/O completion, which I’ll get into another day. The
right thing to do here is to post the IO to a worker thread and wait for it to complete.

By the way – if you are using PAGED_CODE() to assert IRQL at the tops of your functions –
which you should be doing, by the way – remember that this will still pass even if you
are called back at APC_LEVEL, so you will have to either do an explicit IRQL check, or
better yet, just post all I/O off to a thread.

There is some debate as to whether you should simply use system work items or create a
dedicated worker thread. If you go the latter route, remember that there is a nontrivial
cost in setting up a new thread, and you have to be careful about how you kill it off –
you don’t just want to terminate it, because it won’t be cleaned up properly. Instead,
you should have an event that you set when you want the thread to exit.

The other method for doing file i/o is simply to build and send IRPs down to the filesystem
drivers themselves. This is less documented but not difficult to do. Instead of handles,
here you’ll need the device object of the FSD and the file object representing the opened
file. In general, the idea is that you call IoBuildAsynchronousFsdRequest() with appropriate
parameters, and then attach the file object to the next stack location. If you don’t do
that latter step, you’ll see very odd crashes in the FSD. I hope to have an example of
this method posted within the week; check back if you’re curious.

With either method, there are serious deadlock issues. Without going into detail, if you
believe you will re-enter the FSD (as in the case of a virtual disk driver backed by a
a file), your read (and write in particular) I/O needs to be noncached, or you’ll get
into a difficult race with the cache manager.