How Do I Become A Programmer?

Someone asked my the other day how to become a programmer. He mentioned college CS programs, DeVry (a 3-year bachelor’s degree school), a local “programming boot camp”, etc. This is an interesting question, as I’m getting ready to post a few programming jobs for my development team, so I thought about what I told the HR guys to put in the posting.

The most important quality I look for in hiring programmers (apart from technical knowledge/skill) is the ability to make order out of chaos. I don’t know precisely how to formulate this in words, but it’s that characteristic by which people can take an amorphus set of desires (i.e. “i’d like to encrypt files on disk”) and turn them into a functional product, with the 10,000 considerations necessary. Now, in reality, product definitions will be more precise than this, but no matter how well you do defining a task, the creativity of the programmer will still be called upon at some point, and nothing is more irritating to me as a manager than when programmers get blocked on a lack of personal problem-solving horsepower. Notice that this is not something that can be learned at school (or at least I never had any classes that helped me here, in ~250 hours of college). You just have to live life and solve life’s problems.

Related to this, I like to know that the candidate really loves programming. One good way for me to know this, which incidentally is also a fantastic way to learn to program, is to see involvement in some sort of open-source or recreational coding. The upsides to this are huge – I know that the person loves coding as a first principle, I can (often) see actual output from the person, I can get a feel for how fast the person can work, I can learn about task commitment and creative problem solving, etc. The list goes on.

There are also, of course, the basic qualifications that one looks for in a programmer, but to me, they are implied by the above two. I’d be curious to know what others think about what makes a good development team member.

Randomness

Pathetic, I know, but after three days, I’m still catching up from my vacation. C’est la vie, I guess.

One topic that comes up regularly at my office, and appears to have come up on NTDEV while I was out of town, was randomness. Really wrapping your brain around the idea of randomness requires some interesting philosophical thought, as pointed out by NTDEV reader Vasili Galchin, who posted some interesting stuff here:

To learn more about randomness, check this out.

A related topic that comes up a lot in crypto is cryptographic entropy. Basically, entropy (conceptually borrowed from thermodynamics, if you’re curious) leads to randomness. Learn more here, here, and here.

An important concept to realize here is that it’s very difficult to create entropy, and it’s impossible to do so algorithmically. Picture any program, machine, etc., that executes deterministically – no matter how random its output, it’ll produce the same output next time. You have to sample entropy from the world around you, from a nondeterministic source, in order to have something really useful in crypto.

The misconception that entropy can be created algorithmically leads to a number of interesting busts on the part of programmers. One common scenario goes something like this: “I need a random key for a 160-bit crypto algorithm I’m using. I’ll just take the 5-10 characters typed in by the user and SHA-1 them into a random key of the proper length.”

Now, what has happened here? Does the user suddenly have a 160-bit random secret key? No, he has a mechanical transformation of a 40- to 80-bit key. Half as secure, right? Wrong! 1/(2^80) as secure – remember your exponential math.

But it’s actually even worse than that. In the user’s 10 characters, he probably only chose characters from the keyboard, and probably only the letter and number keys, plus their shifts. That gets you 2*(26+10) = 72 possibilities, out of 256 per byte. That means that effectively, you probably only have 6 bits per byte of entropy, because you can make very reliable predictions about two bits’ worth of information in each byte. You can, of course, continue to make intelligent guesses about the keys that people will tend to pick (english words, few capital letters, few numbers, etc), all of which eventually costs you more and more entropy.

Another way to lose entropy is thorugh mechanical transformation. If it’s true that you cannot create entropy by using an algorithm, it stands to reason that you probably lose entropy in some transformations. For example, a simple substitution mapping of any word lexically greater than “moose” with the number 1, and the rest with the number 0, loses all but one bit of the entropy in the input. In fact, the very best crypto algorithms in the world are designed to not lose any of the entropy in the input stream, but it takes a lot of work to make sure that doesn’t happen. You’re unlikely to get it right on your own without an intense amount of research and peer review.

The point here is that sometimes you have less entropy than would appear at first glance. In order to have a secure system, you need unguessable keys. That requires lots of entropy. Keep this in mind next time you’re working on a crypto-related system.

OK, excercises for the readers at home:

  1. How big of a Diffe-Hellman exchange is needed in order to properly secure AES-192?
  2. How many typed characters do you really need to get from a user for a AES-192 key?

[hint: for #1, check old IETF ipsec working group logs.]

Eurotrip

I’m back from vacation, after having visited Switzerland, Liechtenstein, Austria, Germany, Luxemborg, and France. I’m still really screwed up from jet lag, and have only mostly dug through my 10000-ish e-mails, but it was all worth it, as the trip (and the two weeks off!) were fantastic.

It looks like ArsTechnica changed its layout while I was off the air. The new site looks nice. Check it out if you haven’t before.

Guarded Mutexes

I’ve been in an ongoing conversation with some folks about the new Guarded Mutexes that are being used in Windows Server 2003 and higher. They’re a little bit different than previous synchronization mechanisms, so they deserve their own consideration.

Guarded mutexes were introduced into the memory manger of Windows Server 2003 as a performance optimization over FAST_MUTEXes and other synchronization primitives. As you know, in order to acquire a fast mutex, typically the IRQL must be raised to APC_LEVEL, or else the driver must be in a critical region. Otherwise, the potential for deadlock exists due to APCs. The two mechanisms of disabling APCs are different in one respect: raising to APC_LEVEL disables special kernel APCs.

However, as OSR’s Tony Mason, points out, raising the IRQL to APC_LEVEL requires an expensive access to the APIC on a multi-processor computer, leading to a performance hit. In order to address this, Microsoft has introduced guarded mutexes, which merely disable all forms of APCs without adjusting the IRQL.

The following are some guarded mutex functions exported by my 3790 XP-64 kernel:


lkd> x nt!*guarded*
fffff800`00843bb0 nt!KiWaitForGuardedMutexGate = 
fffff800`00855970 nt!KeAcquireGuardedMutexUnsafe = 
fffff800`008266f0 nt!KeEnterGuardedRegion = 
fffff800`0081ddc0 nt!KeInitializeGuardedMutex = 
fffff800`00855940 nt!KeReleaseGuardedMutexUnsafe = 
fffff800`00935e80 nt!WmipTraceGuardedMutex = 
fffff800`0081dd60 nt!KeAcquireGuardedMutex = 
fffff800`008267a0 nt!KeTryToAcquireGuardedMutex = 
fffff800`0081dcb0 nt!KeReleaseGuardedMutex = 
fffff800`00826710 nt!KeLeaveGuardedRegion = 

…and the structure they operate on:


lkd> dt nt!_kguarded_mutex
nt!_KGUARDED_MUTEX
   +0x000 Count            : Int4B
   +0x008 Owner            : Ptr64 _KTHREAD
   +0x010 Contention       : Uint4B
   +0x018 Gate             : _KGATE
   +0x030 KernelApcDisable : Int2B
   +0x032 SpecialApcDisable : Int2B
   +0x030 CombinedApcDisable : Uint4B

A little imagination will let you guess approximately how to use these new routines, although they don’t show up in my DDK or IFSKit.

There are a couple of interesting programming considerations that guarded mutexes raise. The first is that it is now no longer sufficient to test the IRQL for APC_LEVEL if you want to see if special kernel mutexes are enabled. The main impact I’ve seen so far is that it’s not safe to call Zw* file management routines or send IRPs to filesystems just because the IRQL is PASSIVE_LEVEL. You must also test to make sure that special kernel APCs are not disabled.

The even better part is that, as of today, there is absolutely no way to make that test. There is a field in the KTHREAD structure that controls whether or not special kernel apcs are enabled:


lkd> dt nt!_kthread
   ....
  +0x0bc KernelApcDisable : Int2B
   +0x0be SpecialApcDisable : Int2B
   +0x0bc CombinedApcDisable : Uint4B
    ....

…but there is no exported API to test that field. Testing it directly is a bad idea, of course, so what’s a programmer to do?

The simple answer, as far as I’m concerned, is to just post all file I/O off to a worker thread if there’s any chance at all that you could be getting called from the memory manager. Someday there will be a new API (According to Neal Christiansen from Microsoft, someday == Srv03 SP1), but until then, better safe then sorry.