Hmm, that’s not a very Politically Correct name for a page, now is it?
My favorite new bug that I found (or rather, had pointed out to me) at DevCon has to do with “dummy pages”. Landy Wang, the guy who seems to own the memory manager these days, pulled together a group of people after his most excellent talk about the future directions of MM. I wish I could give out details, but they’re most certainly NDA material. For me, it was easily the best presentation of the event. Anyway, after the presentation, Landy described a very interesting bug that has just been discovered at Microsoft concerning these dummy pages, and he gave me explicit permission to talk about it. As if I’d refuse. 🙂
A little background: when MM wants to bring in a page off of disk due to a page fault, it issues an I/O request and brings it in. No big deal there. But if you have a row of four pages like this (x = paged out, v = valid):
MM will try to bring in both missing pages at once, because it is more efficient to issue one big I/O request than lots of little ones. Still not complicated. The fun happens when you have this:
If the first missing page takes a fault, MM would like to bring in the third one with it. BUT – if the middle page is dirty, it cannot be paged in from disk without destroying data. The solution MM uses to avoid the performance hit is to substitute the valid (dirty) page in the middle with a dummy page. The dummy page is a sacrificial page of memory whose only purpose is to receive data from a large MM inpage request. It is then returned back to the bit bucket, and the net result is that pages 1 and 3 are updated, while page 2 remains untouched. Neat trick, eh?
There is a big problem here for drivers that are in the paging path: their dummy page data could become invalid at any point, changed out from under them by another read (or ostensibly something else entirely). This means that if drivers have to depend on a consistent copy of the data (crypto drivers, compression drivers, etc.), they must double-buffer the read.
The fun part about this bug is that it has apparently been in the code since Windows XP, five-ish years ago. Microsoft seems to have just discovered this problem in the last two months, though, because Longhorn has changed an aspect of how dummy pages are used. In Longhorn, there is a very high probability that your dummy page will get scribbled on while you think you have control of it.
The moral of the story is that you have to treat read data in a similar way to write data – it can change out from under you at any time, so if you need a consistent view of the bits, you had better make a copy. Incidentally, this whole dummy page mechanism works on the write path as well, but that doesn’t matter as much since most people expect this sort of weirdness on the write path. Or if not, they should. 🙂
So: go out and fix your drivers, and now. They are probably crashing on most current Microsoft OSes, and they will almost certainly crash in Longhorn.