Randomness

Pathetic, I know, but after three days, I’m still catching up from my vacation. C’est la vie, I guess.

One topic that comes up regularly at my office, and appears to have come up on NTDEV while I was out of town, was randomness. Really wrapping your brain around the idea of randomness requires some interesting philosophical thought, as pointed out by NTDEV reader Vasili Galchin, who posted some interesting stuff here:

A related topic that comes up a lot in crypto is cryptographic entropy. Basically, entropy (conceptually borrowed from thermodynamics, if you’re curious) leads to randomness. Learn more here, here, and here.

An important concept to realize here is that it’s very difficult to create entropy, and it’s impossible to do so algorithmically. Picture any program, machine, etc., that executes deterministically – no matter how random its output, it’ll produce the same output next time. You have to sample entropy from the world around you, from a nondeterministic source, in order to have something really useful in crypto.

The misconception that entropy can be created algorithmically leads to a number of interesting busts on the part of programmers. One common scenario goes something like this: “I need a random key for a 160-bit crypto algorithm I’m using. I’ll just take the 5-10 characters typed in by the user and SHA-1 them into a random key of the proper length.”

Now, what has happened here? Does the user suddenly have a 160-bit random secret key? No, he has a mechanical transformation of a 40- to 80-bit key. Half as secure, right? Wrong! 1/(2^80) as secure – remember your exponential math.

But it’s actually even worse than that. In the user’s 10 characters, he probably only chose characters from the keyboard, and probably only the letter and number keys, plus their shifts. That gets you 2*(26+10) = 72 possibilities, out of 256 per byte. That means that effectively, you probably only have 6 bits per byte of entropy, because you can make very reliable predictions about two bits’ worth of information in each byte. You can, of course, continue to make intelligent guesses about the keys that people will tend to pick (english words, few capital letters, few numbers, etc), all of which eventually costs you more and more entropy.

Another way to lose entropy is thorugh mechanical transformation. If it’s true that you cannot create entropy by using an algorithm, it stands to reason that you probably lose entropy in some transformations. For example, a simple substitution mapping of any word lexically greater than “moose” with the number 1, and the rest with the number 0, loses all but one bit of the entropy in the input. In fact, the very best crypto algorithms in the world are designed to not lose any of the entropy in the input stream, but it takes a lot of work to make sure that doesn’t happen. You’re unlikely to get it right on your own without an intense amount of research and peer review.

The point here is that sometimes you have less entropy than would appear at first glance. In order to have a secure system, you need unguessable keys. That requires lots of entropy. Keep this in mind next time you’re working on a crypto-related system.

OK, excercises for the readers at home:

1. How big of a Diffe-Hellman exchange is needed in order to properly secure AES-192?
2. How many typed characters do you really need to get from a user for a AES-192 key?

[hint: for #1, check old IETF ipsec working group logs.]