Hot Patching

Well, how frustrating is this… as far as I can tell, .Text completely ate my last post, except for a couple of sentences at the top. I’ve switched to writing these posts off-line (like a real blogger!), so hopefully that’s the last time this will happen to me.

Some co-workers and I were discussing a change you may have noticed in recent
Windows kernel binaries. Disassembling into a kernel function shows an
odd-looking instruction at the top:

   lkd> u NtCreateFile
   nt!NtCreateFile:
   80570d48 8bff             mov     edi,edi
   80570d4a 55               push    ebp
   80570d4b 8bec             mov     ebp,esp
   ...

Notice the ‘mov edi,edi’ at the top – that would seem to be a creative no-op,
and in fact, it is. This seemingly useless instruction is designed to enable a
new capability for modern kernels: hot patching. Hot patching is designed to
address the availablility requirements of modern servers (and workstations, for
that matter), by enabling certain hotfixes to patch the live kernel and redirect
function calls from the existing function into a replacement (or
potentially a filter function, etc). Administrators need something like this to
enable them to apply hotfixes in a timely manner, rather than having to wait for
a maintenance window. The world is a better place when servers are patched!

Technically, this works by replacing those two extra MOV bytes at the top of a
function with a jump instruction. Now, some of you may have noticed that, on
x86, two bytes will only get you a short jump, good for 127 bytes in either
direction. Clearly this isn’t going to do it – you’d have to write the new
function into memory that is probably occupied by another function, which is no
good. However, a little extra searching reveals this:

   lkd> u NtCreateFile-0x10
   nt!NtOpenFile+0x55:
   80570d38 0f841afdffff     je      nt!FsRtlCurrentBatchOplock+0xf7 (80570a58)
   80570d3e e9cd440600       jmp    nt!FsRtlRegisterUncProvider+0x18b (805d5210)
   80570d43 90               nop
   80570d44 90               nop
   80570d45 90               nop
   80570d46 90               nop
   80570d47 90               nop
   nt!NtCreateFile:
   80570d48 8bff             mov     edi,edi
   ...

Well, just what we needed – five extra bytes in which to place a long jump (relative
to the current selector). Those five bytes can get patched on-the-fly with the pointer
to the real function. Because they’re immediately before the MOV, they’re in range for our short jump.
The hotfix can then write a relative long jump into those NOP bytes that points to the replacement function.

This design is interestng for several reasons. First, Microsoft chose to use the
“creative no-op”, rather than two real NOP instructions, even though they then
used five real NOPs for the long jump. The reason is easy – the MOV is called every
time the function is called, whether it is hooked or not (or, more precisely, when it
its not hooked). Because you could be talking about a measurable performance impact on high-traffic code paths,
those two bytes should get executed as fast as possible, and
with minimal side-effects. The mov is just what the doctor ordered – as a single instruction,
it should execute faster than two consecutive NOP instructions.

Why didn’t Microsoft just put the five NOPs in as the first bytes of the
function (rather than using the short jump)? Again, the question is performance
– they have optimized for the common case (the unhooked function), rather than
the uncommon case. This costs two extra bytes per function statically, but they
save you 2/3 of the instructions (assuming you reserved your five bytes with
MOV / MOV / NOP).

One other question that came up was why Microsoft didn’t just use Detours, or
something much like it. Detours relies on disassembly to perform runtime hooks
like these, and it winds up overwriting a number of bytes at the begining of the
function to make this work. This is difficult, at best – the implemented solution
is much more robust. The cost is an extra seven bytes per function, but
it’s worth it, IMO.

This is an interesting, and instructive, bit of low-level software engineering, where
space, performance, and robustness must be balanced.

8 Replies to “Hot Patching”

  1. To try out this functionality yourself, simply call NtSetSystemInformation with the Information Class 69 (SystemApplyHotPatch).

    You’ll need a large undocumented structure, and a handle to a file contaninga a special hot patch PE section with special hot patch data. Once you have that, not only can you patch the data, but you can also install Rtl Debug Hooks which will hook everything you need and notify you.

    Best regards,

    Alex Ionescu

  2. >>The reason is easy – the MOV is called every time the

    >>function is called, whether it is hooked or not (or,

    >>more precisely, when it its not hooked).

    Actually it’s about applying patches safely – if there were two NOPs the EIP in some thread at the moment when patch is being applied could point to second NOP which would result in execution of second byte of jmps as an instruction. But when there’s a 2-byte instruction, you can always replace it with another 2-byte instruction, cause EIP points either at it or at next instruction (it’s also SMP-safe, cause CPU will re-read instruction when it detects a write to any of the instruction bytes while)

  3. This reminds me of the Geary incident. I’m paraphrasing extensively from “Undocumented Windows” (ISBN 0-201-60834-0). Michael Geary worked on Adobe Type Manager which hooked into Windows 3.0. Amongst other hacks required for this, he had to hook *part of* CreateDC. That function needs to load a driver and use GetProcAddress to find the various entry points. Geary patched that code to point to routines that he wrote as part of ATM. This clearly makes the ATM code terribly sensitive to the implementation of CreateDC, so Geary actually searched the code for the right byte pattern to patch rather than hard-coding offsets or anything. In Windows 3.1, Microsoft re-implemented CreateDC to read “JMP elsewhere; CALL LoadLibrary; CALL GetProcAddress; …” which means that Geary’s code is certain to find the right code to patch almost immediately. In other words, Win3.1 was deliberately designed to be safely hacked by Adobe Type Manager.

    Returning to modern times, it seems entirely possible that MS have made this particular routine “safely hackable” because they know that there are (anti-virus?) products out there using dubious hacks to intercept NtCreateFile and they want to make it as easy as they can for these products. (“Well, if you really must, then here’s how to do it safely.”)

  4. I am so [url=http://access.2surf.eu]lucky[/url] on having what I have! And good luck in yours [url=http://2access.2surf.eu]search[/url].
    Just visit [url=http://access.122mb.com]my site[/url].

  5. why not just replace the first 5 bytes of the function with the jump and do whatever was being done in those 5 bytes in that section to which the jump is done?

    This doesn’t incur any performance hit. You just need to be extra careful when replacing the 5 bytes, that you are actually not executing in that area.

Leave a Reply to Irdnwoui Cancel reply

Your email address will not be published. Required fields are marked *