Memory Barriers Wrap-up

Hello blogosphere! I hope everyone had a great time this weekend puzzling through the mysteries of memory barriers. Personally, I spent the weekend coding and reading about realtivity (a recent post by Raymond Chen got me re-re-re-re-re-started on physics again).

In addition to the above-mentioned nonsense, I got some time to drag out the intel manuals to see what they had to say about x86 memory barriers. For the curious, the details can be found in section 7.3 of the 3rd volume of the Intel Pentium 4 manuals.

The situation is slightly different between the {i486, P5} and P6+ (Pentium Pro, Pentium II, Xeon, etc.) processors. The first group of chips enforces relatively strong program ordering of reads and writes at all times, with one exception: read misses are allowed to go ahead of write hits. In other words, if a program writes to memory location 1 and then reads from memory location 2, the read is allowed to hit the system bus before the write. This is because the execution stream inside the processor is usually totally blocked waiting for reads, whereas writes can be “queued” to the cache somewhat more asynchronously in the core without blocking program flow.

The P6-based processors present a slightly different story, adding support for out-of-order writes of long string data and speculative read support. In order to control these features of the processor, Intel has supplied a few instructions to enforce memory ordering. There are three explicit fence instructions – LFENCE, SFENCE, and MFENCE.

  • LFENCE – Load fence – all pending load operations must be completed by the time an LFENCE executes
  • SFENCE – Store fence – all pending store operations must be completed by the time an SFENCE executes
  • MFENCE – Memory fence – all pending load and store operations must be completed by the time an MFENCE executes

These instructions are in addition to the “synchronizing” instructions, such as interlocked memory operations and the CPUID instruction. The latter cause a total pipeline flush, leading to less-efficient utilization of the CPU. It should be noted that the DDK defines KeMemoryBarrier() using an interlocked store operation, so KeMemoryBarrier() sufferes from this performance issue.

This story changes on other architectures, as I’ve said before, so the best practice is stil to code defensively and use memory barriers where you need them. However, it doesn’t look like you’re likely to run into these situations in x86-land.

7 Replies to “Memory Barriers Wrap-up”

  1. I always manage to forget something… sigh…

    The same issues show up in usermode code, by the way. I didn’t mean to imply that this was kernel-mode-only. In fact, the SDK has MemoryBarrier() for the same reason. For those of you who invest your time in the .Nyet silliness, there’s even Thread.MemoryBarrier for you.

    Anyone know how to do this in a JVM? 🙂

    The sdk documentation idiotically says that you only have to use memory barriers "if you know your code will be running on multiprocessor architectures using weak-ordering CPUs". Hmm… Let me see… that seems to include all hyperthreaded P4s, doesn’t it?

    Even more hmmm… I have no idea if the two virtual CPUs that are exposed by an HT P4 have the same memory barrier issues. Theoretically Intel could keep the entire physical chip self-consistent (in fact, there are statements to that effect in the manuals, but they’re in a different context).

  2. Java doesn’t really go that close to the underlying architecture to be as specific as saying ‘do all of this before anything else’ as far as I know. You would have to do something along the lines of a mutex or spinlock via the synchroized keyword in java. In java, synchronized can be added to any method’s declaration to make it ‘mutexed’ across each instance of the object. For instance, if you make an object with two methods, each synchronized, and created one instance of the object. When instantiated, any thread can only run one of those methods at a time. In addition, additional instances of the object are only restricted by THEIR instance… meaning that both objects could run method 1 at the exact same time, but one object cannot run both of it’s methods at the same time.

    This is, in my opinion, handy, yet limiting. luckily, java allows you to use the syncronized keyword in another more useful way. You can basically treat any Object (I say Object because Object is the base class for any object in java) as a mutex/spinlock. If you want to use specific locks, just do synchronized (someObj) { code; } around any code that you want to be locked based on someObj. Only one synchronized (someObj) { code; } block can run at a time for the intance of someObj in that scope. If you want to do a process wide lock, declare a static Object outside your class instance (like at the top of a .java file) and do syncrhonized blocks off of it. This is equivilant to mutexes in the java world.

    Another handy feature of java’s threading setup is the fact that each object you use or define, as long as it’s a real object like Integer and not a language literal like int, can be used to synchronize, notify, and wait on. The Object base class has methods called wait, notify, notifyAll. One version of the wait method takes a timeout also. Notify will cause one thread that is waiting to wake up. NotifyAll will wake them all up. Very handy indeed.

  3. Breaker 19, There’s a big pile up on i-90: 933, 1053

    Yes, that’s right… read the code and see why. If you uncomment out the syncronized blocks it works just fine. Code follows:

    public class Test

    {

    public static boolean keepTruckn = true;

    static public class T1

    extends Thread

    {

    int goods[] = null;

    public T1( int goods[] )

    {

    this.goods = goods;

    }

    public void run()

    {

    while ( keepTruckn )

    {

    //synchronized ( goods )

    //{

    goods[0]++;

    goods[1]++;

    //}

    }

    }

    }

    static public class T2

    extends Thread

    {

    int goods[] = null;

    public T2( int goods[] )

    {

    this.goods = goods;

    }

    public void run()

    {

    while ( true )

    {

    //synchronized ( goods )

    //{

    if ( (goods[0] != goods[1]) && keepTruckn )

    break;

    //}

    }

    System.err.println("Breaker 19, There’s a big pile up on i-90: " + goods[0] + ", " + goods[1] );

    keepTruckn = false;

    }

    }

    public static void main( String args[] )

    {

    int goods[] = new int[2];

    goods[0] = goods[1] = 0;

    T1 t1 = new T1( goods );

    T2 t2 = new T2( goods );

    t1.start();

    t2.start();

    try

    {

    t2.join();

    t1.join();

    }

    catch ( Exception e ) { }

    }

    }

Leave a Reply

Your email address will not be published. Required fields are marked *