Innies And Outies

Lots of people lately have been trying to do kernel mode file I/O, and running
into their share of problems in the process. I was just involved in a virtual
disk project that relied heavily on file I/O in the driver, so I thought I’d
post a quick tutorial while it’s fresh in my mind.

First things first, try not to do this. Someone from Microsoft made the point
on NTDEV that you really don’t want to have to do file I/O from kernel mode if
you can avoid it. There are security issues to consider (e.g. opening files
that the user wouldn’t have had access to), and besides that, it’s just
tricky.

With that said, if kmode file I/O is what you need to do, there are two basic
ways to do it: IRP-based and function-based. The easier of the two methods is
the function-based method, which employs the use of the Zw APIs for file manipulation.
Typically, files are opened with ZwCreateFile, read written with ZwReadFile and
ZwWriteFile, perhaps queried with ZwQueryInformationFile, and closed with ZwClose. This
method of file manipulation is geared toward using handles, so the standard warnings
about kernel-mode handle use apply. If you’re running on a newer OS, specifying
OBJ_KERNEL_HANDLE in your OBJECT_ATTRIBUTES is always a good idea, as it makes the handle visible in all
contexts, while at the same time making it useless from user mode. Other than some
basic API differences (i.e. OBJECT_ATTRIBUTES structures, UNICODE_STRING strings, etc.),
this should feel quite a lot like Win32 access.

The one big caveat with function-based file I/O is that it cannot be done from any
IRQL > PASSIVE_LEVEL. This, in particular, includes APC_LEVEL. If you happen to be
sitting below a filesystem driver, for example, you may find yourself called back at
APC_LEVEL, and it is incorrect to use any of the Zw* file manipulation functions at that
IRQL. The reason has to do with I/O completion, which I’ll get into another day. The
right thing to do here is to post the IO to a worker thread and wait for it to complete.

By the way – if you are using PAGED_CODE() to assert IRQL at the tops of your functions –
which you should be doing, by the way – remember that this will still pass even if you
are called back at APC_LEVEL, so you will have to either do an explicit IRQL check, or
better yet, just post all I/O off to a thread.

There is some debate as to whether you should simply use system work items or create a
dedicated worker thread. If you go the latter route, remember that there is a nontrivial
cost in setting up a new thread, and you have to be careful about how you kill it off –
you don’t just want to terminate it, because it won’t be cleaned up properly. Instead,
you should have an event that you set when you want the thread to exit.

The other method for doing file i/o is simply to build and send IRPs down to the filesystem
drivers themselves. This is less documented but not difficult to do. Instead of handles,
here you’ll need the device object of the FSD and the file object representing the opened
file. In general, the idea is that you call IoBuildAsynchronousFsdRequest() with appropriate
parameters, and then attach the file object to the next stack location. If you don’t do
that latter step, you’ll see very odd crashes in the FSD. I hope to have an example of
this method posted within the week; check back if you’re curious.

With either method, there are serious deadlock issues. Without going into detail, if you
believe you will re-enter the FSD (as in the case of a virtual disk driver backed by a
a file), your read (and write in particular) I/O needs to be noncached, or you’ll get
into a difficult race with the cache manager.

3 Replies to “Innies And Outies”

  1. I don’t have any extra insight on why the change was made, I just know that it was made and that it can definitely cause problems. It was a real head scratcher when code that was definitely running at PASSIVE_LEVEL was hanging because the special kernel APC for I/O completion was blocked…

    -scott

    OSR

Leave a Reply

Your email address will not be published. Required fields are marked *