Business Expenses

So I was over at a friend’s house the other day, and it happened that he had a WinXP SP1 box that was crashing on boot. He’s a budding driver developer, and wanted help actually figuring out why the driver was crashing, as opposed to just removing it and going on about his business.

So, I pulled my laptop out of my bag and noticed that… sigh… it lacks a 9-pin serial port. His computer lacks a firewire port. Stuck. So, I got in the car and drove to the local MicroCenter to pick up a usb-to-rs232 thingy and a null modem cable. The total for these bits of archaic technology? $65.00! Yikes!

The other wrinkle is the fact that my laptop is an Apple PowerBook, so I run my debugger, build environment, etc., inside VirtualPC. This set-up works fine, most of the time, but I admit I was worried about the idea of getting WinDbg to work right with this set-up.

Well, as it turns out, my worries were well-founded. There’s a bug somewhere causing the VPC to not get interrupts from the serial port (I gather). Whatever the case, I could send data, but I could not receive it. Microsoft is halfway between releases of VPC, so I’m hoping this is fixed in the upcoming 7.0 release. Meanwhile, back to the other laptop, with an actual serial port, for now.

Total time spent: 3 hours.

[Now playing: Super D EP by Ben Folds.]

Who’s Asking: Solution

Well, after all of this chasing of security issues around the internals of the OS, I guess it’s finally time to reveal way to handle this problem.

The key here is recognizing that the security context information must be valid somehow during the create path. No matter what else happens above, the create IRP that is passed into the target driver (i.e. a filesystem driver, for example) must have valid security information – how else would the filesystem driver know if the requestor has permission to e.g. open the file?

Digging into the IO_STACK_LOCATION a bit, you’ll find the SecurityContext member of the Create options in the big old parameters union. This is where we find the authoritative security context information present for the request. Programming an access check routine based on this information is still a little tricky, and requires a good understanding of the security model of the OS, but this is where to start.

I’d like to thank Ken Johnson, another Positive Networks coder, for lending a hand on this series of articles. This series has been fun for a few reasons – it’s interesting getting to the bottom of the issues created by the asynchronous processing model of the OS, and it’s an example of the principle that you should never be too sure that you know the right answer to a problem, regardless of how well your solution seems to work.

Happy hacking!

[now playing: Elephant, by the White Stripes]

What’s In A Name?

Quick note: it appears to this reporter that Microsoft has indeed standardized on x64 as the name of the AMD64-related 64-bit CPU architecture. There were tons of suggestions (AMD64, x86-64, x64, AA-64, …). The pinko liberal left-wing anti-corporate WTO-picketing hippie in me suspects that Microsoft bowed to Intel’s pressure to change the architecture name (otherwise it would have resulted in using the AMD64 build of Windows to run on Intel’s x64 systems), but the level-headed nerdy-geek computer scientist abstractaholic in me thinks that Microsoft has just finally eliminated vendor names from platform definitions in a long-overdue nomenclature simplification.

Whatever your take, this news could not possibly be more insignificant, unless you happen to feel bad for (or be) AMD.

Who’s Asking: The Problem With IoGetRequestorProcess()

Last time I discussed how IoGetRequestorProcess() can get you where you want to go in some cases, if where you want to go involves finding out “who” originated an I/O request. See previous articles in this series if you’re new to the discussion.

IoGetRequestorProcess works by examining the thread information encoded in the IRP. This information is typically set by the IO manager to reflect the thread in which the IRP was created. The problem, of course, is that not all IRPs will have the right info here:

  • No Process – It’s entirely possible for IRPs to be created without any process information at all. Anyone who constructs his own IRPs from scratch will have the option of not setting thread context information on the IRP. I’m not sure if this is legal, but it’s certainly possible.
  • System Process – Many IRPs created by the driver will be in the context of the system process. This is usually the case when a request is posted to a worker thread or a system thread. As I mentioned before, this is not an uncommon case in the presence of filter drivers, or really from within any device stack at all.
  • Wrong/Arbitrary Process – The most insidious case, of course, is when you find yourself with valid-looking but wrong data. This will clearly be the case if a driver above yours in the stack is called back in an arbitrary process context and decides to create an IRP. This can happen during the rundown of a DPC routine, for example.

So you see, we still seem to be high and dry here. Whatever is a driver to do?

I love cliff-hangers 🙂

Who’s Asking, Part 2

I’m freshly caffeinated after an evening at Barnes & Noble, which, despite its inclusion of a Starbucks, lacks a hotspot. Who knew. At any rate, I apparently have nothing better to do on a Friday night, so I thought I’d blog. How pathetic. 🙂

To understand why the previous method of finding out “who” is the submitter of a given IRP fails, you really have to understand one of the most fundamental architecture decisions DaveC & company made about the kernel. As we discussed last time, it’s not uncommon at all to post an IRP off to a worker thread. Now, think about the paradigm shift at work here: in all other situations involving communication between functions (and even between libraries), the call stack is typically built using the architecture’s built-in stack mechanisms – esp/ebp on IA-32, for example. In fact, we C programmers have almost forgotten about the fact that there can be other glue to hold together functional programs besides the stack.

There are problems with the traditional call stack, though. Microsoft learned some of these problems in the context of supporting the legacy Windows 9x architecture, which I heard was an influencer of the decision to make the switch to a fundamentally new way of managing function-to-function communication: I/O Request Packets. Before I thought about this shift in software architecture, I was pretty unclear as to why Microsoft named the parameter-holders in IRPs IO_STACK_LOCATIONs.

But think about it: think about the IRP and its associated IO_STACK_LOCATIONS as, primarily, a replacement for the good old fashioned call stack. What characterizes a call stack? Arguments? Yep, they’re part of the IRP – look at the IO_STACK_LOCATION in the DDK. Return addresses? Yep – see completion routines, for example. Local variable storage? Well, sort of – there are a few general-purpose scratch areas in the IRP structure (four PVOIDs in a DriverContext array, although you can’t actually use all of them all of the time, and only for the time that you own an IRP – beware CSQ interactions, for one thing).

Clearly manipulating the goo inside of an IO_STACK_LOCATION is more of a pain than just using the language-intrinsic stack manipulation functions that utilize esp and local stack memory. So, what do you gain for your trouble? Well, two principle things, and lots of ancillary benefits. Firstly, you get a practically unlimited stack size. Remember that the Windows kernel only provides 12K of stack to kernel-mode threads. There are a couple of reasons for this, but we’ll discuss them another time. Suffice it to say that there is a theoretical maximum limit on how deep call stacks can go, and that limit becomes a practical consideration when you start thinking about filter drivers. It’s sometimes very difficult to get your filesystem filter driver to play nice with Norton Antivirus, for example. Microsoft even sponsors “PlugFests” on a regular basis, where all of the filesystem folks get together from all over the world and test compatibility with each others’ drivers.

Using IRPs, on the other hand, allows the creator to specify enough IO_STACK_LOCATIONs to get all the way down the driver stack, guaranteed. This simply wouldn’t be possible without an arbitrarily-definable stack size. The 12K stack is suddenly much less of an issue.

But the more important architectural effect of using IRPs is that it allows for the creation of a fundamentally asynchronous operating system kernel. To get true asynchronicity, you need a well-defined way to get a return value back from the function you called, and you cannot depend on the (traditional) call stack to do it, otherwise you wouldn’t be asynchronous. Think about this for a second if it doesn’t make sense. Now, it’s quite possible to code up workarounds using pass-by-reference variables in user-mode code. However, this doesn’t work as well in an asynchronous kernel, because you have no idea what process context you’re going to be in at any given time (see the previous article in this series for an example of the ramifications of this fact). To make this work at all, you have to have system-global memory from the paged or (more likely) nonpaged pool. Instead of the laissez-faire world of each developer building his own elaborate return value management system (which would have ended up looking a lot like a part of an IRP anyway), Microsoft standardized on a single method for management of return values.

Other considerations of dealing with an asynchronous system also exist. One in particular is the architecture drivers would likely have in the absence of completion routines: you would have an entire chain of waits, which could of course only be done at < DISPATCH_LEVEL. They would be inefficient because they would forcibly break the asynchronicity of the OS, and even more seriously, they would dramatically increase contention for the dispatcher lock, which is one of the hottest locks in the whole OS. Contrast this scenario with the IRP completion scenario as it exists today. I'll have to write more about how completion works later, because this is now officially waaaaay too long of an article.

Okay, so getting back to the point. . . 🙂 Because you have no idea what process context you’re in at any given time, you have to try something else to discern the sender of an IRP. Fortunately, we don’t have to look long (if you know what you’re looking for!) to find a knight in shining armor to rescue us from our plight. I really shouldn’t drink this much caffeine this late at night. Deep inside the IFS Kit lie two functions, IoGetRequestorProcess() and IoGetRequestorProcessId(). They both take a PIRP and return a PEPROCESS and a ULONG, respectively. They look inside the IRP and, as if by magic (not really; stare at the IRP structure for 30 seconds and it becomes obvious how), they return the process associated with an IRP. It’s once again just a hop, skip, and a jump from there to the SID of the “who” that sent the request.

Problem solved! Problem solved? Problem solved. . .

Problem not solved. . .

To be continued next time I am over-caffeinated!

[now playing: Deciever, Chris Thile’s newest release. Some say most talented musician musicing today. Read all about it.

Wag Learns vi

One of my long-time programmers, Steve Wagner (who has to go by “Wag” because I’m already Steve), has finally decided to stop taking the constant ridicule from his fellow coders and learn to use the h/j/k/l keys in vi to get around. Arrow keys are for wimps.

Who’s Asking: Option 1

The first option for determining the user responsible for originating a given IRP is perhaps the most obvious, if you happen to be a top-level driver. Top-level drivers are simply those drivers that are at the top of a device stack, and that are directly called by system calls (through the kernel). Filesystems can be top-level drivers, whereas network card drivers never are.

Why is this distinction relevant? Well, there are a couple of features that allow you to make some assumptions if you know you’re going to be top-level (sometimes). The first such assumption is that you will be in the context of the caller of the system call. Given that, you could simply get the process via IoGetCurrentProcess(), and from there, figure out who owns it using some IFSKit-supplied magic.

There are two problems with this design. First, the owner of the thread that’s making the request might not be the owner of the process at all. The thread could be impersonating another user. So, you have to query the thread before the process – not too difficult. However, the real flaw here is that we might not be in the original thread context at all.

Although most commonly seen in filesystems and their related filter drivers, technically any filter driver can “post” an IRP off to a system worker thread or to a driver-created thread designed to process IRPs at a later time. The reasons for doing this are many and varied, but the point is that once these requests are posted off to another thread, you’re not in the same thread context any more, let alone in the same process context. Therefore, your results will be wrong.

How common of a scenario this is can be hard to say. It would be trivial to write a driver that just attaches itself to any given device object in the system and then passes through all of its IRPs from a worker thread. Filesystem filter drivers seem to do this sort of thing often. Does this mean that you can *never* assume you’re going to be a “top-level” driver, because any idiot could legally layer on top of your device object? Things that make you go “Hmmm….”

Next up: a refinement, and another problem.

The Good Old Days

Okay, last cross-post (for a while), I promise. Larry Osterman has another great article up at his blog about a really impressive old hack he did with the MS-DOS redirector. Not a lot of currently topical content, but a great read nevertheless. I remember hacking around in PSPs; does anyone else? 🙂

Who’s Asking?

A question that comes up on the lists every so often (and, ironically, in one of my projects at the moment) is how to know *who* is the submitter of an IRP. Who is variously defined as “which thread”, “which process”, “which session”, or “which user”, depending on the specific need of the driver writer.

There are three ways to answer the “which user” question that I’m going to talk about in the next few posts. Here’s your chance to beat me to the punch by predicting what they’re going to be. When I post, I’ll describe the relative strengths and weaknesses of each technique, and talk a little about when it would be appropriate for use.