Well, how frustrating is this… as far as I can tell, .Text completely ate my last post, except for a couple of sentences at the top. I’ve switched to writing these posts off-line (like a real blogger!), so hopefully that’s the last time this will happen to me.
Some co-workers and I were discussing a change you may have noticed in recent
Windows kernel binaries. Disassembling into a kernel function shows an
odd-looking instruction at the top:
lkd> u NtCreateFile
nt!NtCreateFile:
80570d48 8bff mov edi,edi
80570d4a 55 push ebp
80570d4b 8bec mov ebp,esp
...
Notice the ‘mov edi,edi’ at the top – that would seem to be a creative no-op,
and in fact, it is. This seemingly useless instruction is designed to enable a
new capability for modern kernels: hot patching. Hot patching is designed to
address the availablility requirements of modern servers (and workstations, for
that matter), by enabling certain hotfixes to patch the live kernel and redirect
function calls from the existing function into a replacement (or
potentially a filter function, etc). Administrators need something like this to
enable them to apply hotfixes in a timely manner, rather than having to wait for
a maintenance window. The world is a better place when servers are patched!
Technically, this works by replacing those two extra MOV bytes at the top of a
function with a jump instruction. Now, some of you may have noticed that, on
x86, two bytes will only get you a short jump, good for 127 bytes in either
direction. Clearly this isn’t going to do it – you’d have to write the new
function into memory that is probably occupied by another function, which is no
good. However, a little extra searching reveals this:
lkd> u NtCreateFile-0x10
nt!NtOpenFile+0x55:
80570d38 0f841afdffff je nt!FsRtlCurrentBatchOplock+0xf7 (80570a58)
80570d3e e9cd440600 jmp nt!FsRtlRegisterUncProvider+0x18b (805d5210)
80570d43 90 nop
80570d44 90 nop
80570d45 90 nop
80570d46 90 nop
80570d47 90 nop
nt!NtCreateFile:
80570d48 8bff mov edi,edi
...
Well, just what we needed – five extra bytes in which to place a long jump (relative
to the current selector). Those five bytes can get patched on-the-fly with the pointer
to the real function. Because they’re immediately before the MOV, they’re in range for our short jump.
The hotfix can then write a relative long jump into those NOP bytes that points to the replacement function.
This design is interestng for several reasons. First, Microsoft chose to use the
“creative no-op”, rather than two real NOP instructions, even though they then
used five real NOPs for the long jump. The reason is easy – the MOV is called every
time the function is called, whether it is hooked or not (or, more precisely, when it
its not hooked). Because you could be talking about a measurable performance impact on high-traffic code paths,
those two bytes should get executed as fast as possible, and
with minimal side-effects. The mov is just what the doctor ordered – as a single instruction,
it should execute faster than two consecutive NOP instructions.
Why didn’t Microsoft just put the five NOPs in as the first bytes of the
function (rather than using the short jump)? Again, the question is performance
– they have optimized for the common case (the unhooked function), rather than
the uncommon case. This costs two extra bytes per function statically, but they
save you 2/3 of the instructions (assuming you reserved your five bytes with
MOV / MOV / NOP).
One other question that came up was why Microsoft didn’t just use Detours, or
something much like it. Detours relies on disassembly to perform runtime hooks
like these, and it winds up overwriting a number of bytes at the begining of the
function to make this work. This is difficult, at best – the implemented solution
is much more robust. The cost is an extra seven bytes per function, but
it’s worth it, IMO.
This is an interesting, and instructive, bit of low-level software engineering, where
space, performance, and robustness must be balanced.