Memory Barriers, Part 2

So my question du jour is, “Is anyone still not using Firefox?” I have been getting sick in recent months of friends and family calling me and complaining about spyware, pop-ups, viruses, and so on. Amazingly enough, simply installing Firefox has dropped my personal support call volume to near-zero. I’ve also been using firefox exclusively for months, except for accessing certain MS sites that require IE, and have been thrilled. YMMV, of course, but the newly-released 1.0 preview release runs amazingly well on both Linux and Windows. It’s actually more stable on my amd64 than either the 32-bit or 64-bit versions of IE.

OK, so yesterday, I posted an extra-credit assignment. Nobody tried it, so I’m going to elaborate on it a bit. If you haven’t read yesterday’s post yet, scroll down and do so before trying to go at this one.

int a = 0;
int b = 1;

f()
{
        for(;;)
        {
                ASSERT(a < b);
        }
}

g()
{
        for(;;)
        {
                b++;
                a++;
        }
}

This code is similar to code I once saw a Microsoft person scribble on a whiteboard, and I thought it was a really interesting way to frame the memory barrier problem. Say you create both threads f() and g() on a dual-proc computer and then just walk away and let it run. Will the ASSERT ever fire? According to the MS guy, the answer is “yes”, and the reason is that the a++ can be committed to RAM before the b++, making a == b.

Consider the values of a and b after a few revolutions. There are a couple of different scenarios:

   case 1               case 2
(expected)   (reordered writes)
   a | b                    a | b
   -----                   -----
   0 | 1                   0 | 1   (initial)
   0 | 2                   1 | 1   (after b++; #2 re-orders write)
   1 | 2                   1 | 2 

There is another sequence too, for example: a++, b++, b++, a++; and b++, a++, a++, b++.

There are a couple of interesting things to think about here. The first is that this happens in a loop. That effectively gives you two places to put memory barriers: between b and a, like so:

g()
{
        for(;;)
        {
                b++;
                KeMemoryBarrier();
                a++;
        }
}

or between a and b:

g()
{
        for(;;)
        {
                b++;
                a++;
                KeMemoryBarrier();
        }
}

Notice that there are actually two places this barrier can be placed, with equivalent effect.

These two examples solve slightly different problems, as outlined in the sequences given above.

So, over the weekend, here are three more things to ponder:

  1. What impact does the fact that a++ is actually a read/update/write operation have on this? Is the effect architecture-specific?
  2. Are the reordering issues different between on-chip reordering and compiler-generated reordering? Is this also architecture-specific? (think 64-bit computing here)
  3. What would the tables look like under the various possible sequences with and without the barriers in either or both places?

Have a good weekend!

3 Replies to “Memory Barriers, Part 2”

  1. Are you sure that works? Wouldn’t you need the memory barrier before the *test*?

    Consider either of the cases where you placed a memory barrier; in both of them, there is a possibility of two writes happening before a memory barrier (either a++ then start of loop and b++ or start of loop, b++ then an a++. Now, consider that two writes happen, get reordered on CPU#1 — then CPU#2 runs the test/ASSERT, and then CPU#1 runs the memory barrier. I think CPU#1 will see the reordered writes here.

    I think putting a memory barrier after each write, OR a memory barrier before the test would work.

  2. The test is ostensibly happening on a different chip, where a memory barrier would have no effect. You’re right about the fact that you can get two writes before the barrier, but you can only get two, and the location of the barrier matters.

    In re-reading this article, it seems that I forgot to comment on which of the two scenarios fixes the problem in the example. The truth is that only the first scenario (i.e. KeMemoryBarrier() between b++ and a++) does. The reason is that at the time of the very first memory barrier execution, b is guaranteed to be 2 ahead of a, so it doesn’t matter in which order the next two increments happen. As long as there’s a barrier after every two writes, you’re guaranteed to be either 1 or 2 ahead.

  3. I tried to trip the assert for an hour and it ever did.
    This is on a core duo yonah t2300 which has two cores.

    So how do i know i even need this ?
    And what kind of hardware might this start failing ?

Leave a Reply

Your email address will not be published. Required fields are marked *