Why, you ask, is a title such as this present in a Windows kernel-mode blog?
Our good friends at AnandTech have published an interesting article on G5 vs. Opteron vs. Xeon performance. It has one or two major flaws, such as changing two variables at once and attempting to draw a conclusion nonetheless, but it is a fascinating read anyway. The chip architecture stuff is great, although you can also find similarly technical articles at Ars Technica (which I read daily). The really good stuff starts when they do what amounts to an OS comparison between Darwin (the UNIX under MacOSX) and Linux. They point out a really serious performance problem with multi-threaded application performance, and another one with kernel locking.
Locking is really hard. The best seasoned programmers don’t always do it right, and the collective wisdom keeps changing and improving as time goes on. NT’s internal locking architecture has evolved steadily over the years, just like Linux’s has, and (to a lesser degree) MacOSX’s. The first iterations of windows had several very hot locks that were practically mandatory in some significant kernel paths, such as I/O (the cancel lock) and dispatching (the dispatcher lock). The current performance of Windows is the result of a systematic effort to profile and improve. Beyond a certain point, it doesn’t pay to guess where redesign and recoding will yield improved performance – Raymond Chen has written often about it, and has a recent series on optimization that’s worth a read.
The take-home is to profile your code. There are lots of tools available, some expensive and some free, to help you out here. You can even instrument your own code, though that has its own performance-related consequences. Regardless, if you write any nontrivial production code, you owe it to yourself to profile.