Last time, I wrote about how Iâd decided to write a padding oracle exploit tool from scratch, as part of a CTF, and so that I could learn a thing or two. I promised Iâd tell you how I made it faster⊠but first, a question.
One question Iâve had from colleagues is âwhy didnât you just run PadBuster?â
Itâs a great question, and in general, you should always think first about whether thereâs an existing tool that will get the job done quickly and easily.
Having said that, it took me longer to install PadBuster and the various language components it required than it did to open Visual Studio and write the couple of hundred lines of C# that I used to solve this challenge.
So, from a time perspective, at least, I saved time by doing it myself â and this came as something of a surprise to me.
The time it used up was my normally non-productive time, while Iâm riding the bus into Seattle with spotty-to-zero network connectivity (thereâs none on the bus, and my T-Mobile hot-spot is useful, but neither fast nor reliable down the I-5 corridor). This is time I generally use to tweet, or to listen to the BBC.
I just plain found it interesting to take what I thought I knew about padding oracles, and demonstrate that I had it solidly in my head.
Thatâs a benefit that really canât be effectively priced.
Plus, I learned a few things doing it myself:
- Parallelisation in C# is easier than it used to be.
- Thereâs not much getting around string conversions in trying to speed up the construction of a base64-encoded URL, but then again, when executing against a crypto back-end, thatâs not your bottleneck.
- Comments and blank lines are still important, especially if youâre going to explain the code to someone else.
The other thing that comes with writing your own code is that itâs easier to adjust it for performance â you know where the bottlenecks might lie, and you can dive in and change them without as much of a worry that youâre going to kill the function of the code. Because you know at a slightly more intuitive level how it all works.
You can obviously achieve that intuitive level over time with other peopleâs code, but I wasnât really going to enjoy that.
Looking at some of the chat comments directed at the PadBuster author, itâs clear that other people have tried to suggest optimisations to him, but he believes them not to be possible.
Specifically, he doesnât see that itâs possible to use guesses as to the plaintextâs likely contents to figure out what values should be in the ciphertext. You just plug the values 0..255 into the N-1 ciphertext block until your padding error from the N block goes away, and then that value can be XORed with the padding value to get the intermediate value from the N block. Then the intermediate value gets XORed with the original ciphertext value from the N-1 block to give the original plaintext.
Letâs see how that works in the case of the last block â where weâre expecting to see some padding anyway. Letâs say our block size is 4. Hereâs what two of our ciphertext blocks might look like:
CN-1 | CN | ||||||
0xbe | 0x48 | 0x45 | 0x30 | 0x71 | 0x4f | 0xcc | 0x63 |
Pretty random, right? Yeah, those are actually random numbers, but theyâll work to illustrate how we work here.
We iterate through values of CN-1[3] from 0..255, until we get a response that indicates no padding errors.
0x30 comes back without any padding errors. Thatâs convenient. So, weâve sent âbe484530714fcc63â, and we know now that weâve got a padding byte correct. Buuut that isnât the only right padding byte, because this is the last block, which also has a valid padding byte.
In fact, we can see that 0x30 matches the original value of the CN-1 blockâs last byte, so thatâs not terribly useful. Our padding count has a good chance of not being 1, and weâre trying to find the value that will set it to 1.
Keep iterating, and we get 0x32, giving us a request that doesnât contain a padding exception. Two values. Which one made our padding byte 0x1, so we can use it to determine the intermediate value?
The only way we get two matches will be because the real plaintext ends in a padding count that isnât 0x1. One of those values corresponds to 0x1, the other corresponds to the padding count, which could be 0x2..0x4. [Because weâre using four byte blocks as an example â a real-life example might have a 16-byte block size, so the padding count could be up to 0x10]
The clue is in the original plaintext â 0x30 MUST be the value that corresponds to the original padding count, so 0x32 MUST correspond to 0x1.
[If the original padding count was 0x1, we would only find one value that matched, and that would be the original value in CN-1]
That means the Intermediate value is 0x32 XOR 0x1 = 0x33 â which means the plaintext value is 0x3 â thereâs three bytes of padding at the end of this block.
We can actually write down the values of the last three plaintext and intermediate blocks now:
CN-1 | CN | ||||||
0xbe | 0x48 | 0x45 | 0x30 | 0x71 | 0x4f | 0xcc | 0x63 |
IN | |||||||
?? | 0x4b | 0x46 | 0x33 | ||||
CâN-1 | PN | ||||||
?? | 0x4f | 0x42 | 0x37 | ?? | 0x3 | 0x3 | 0x3 |
Wow â thatâs easy! Howâd we do that? Really simple. We know the last padding must be three bytes of 0x3, so we write those down. Then the intermediate bytes must be the XOR of 0x3 with the value in the CN-1 block.
[I chose in the code, instead of just âwriting downâ the values for each of those bytes, to check each one as I did so, to make sure that things were working. This adds one round-trip for each byte of padding, which is a relatively low cost, compared to the rest of the process.]
Now, if we want to detect the next byte, we want to change the last three bytes of CN-1, so theyâll set the PN values to 0x4, and then iterate through the target byte until we get a lack of padding errors.
So, each new value of the last few bytes of CN-1 will be Câ[i] = C[i] XOR 0x3 XOR 0x4 â taking the value in the original, XORing it with the original plaintext, and then with the desired plaintext to get a new value for the ciphertext.
Iâve put those values of CâN-1 in the table above.
This trick doesnât just stop with the padding bytes, though. Iâm going to guess this is a JSON object, so itâs going to end with a â}â character (close-brace), which is 0x7d.
So, Câ = C XOR 0x7d XOR 0x4 = 0xbe XOR 0x7d XOR 0x4 = 0xc7.
Letâs try that â we now send âc74f4237â â no padding error!
A successful guess for the last four bytes of PN. Now we can fill in more of the table:
CN-1 | CN | ||||||
0xbe | 0x48 | 0x45 | 0x30 | 0x71 | 0x4f | 0xcc | 0x63 |
IN | |||||||
0xba | 0x4b | 0x46 | 0x33 | ||||
CâN-1 | PN | ||||||
0xc7 | 0x4f | 0x42 | 0x37 | 0x7d | 0x3 | 0x3 | 0x3 |
Awesome.
That does require me making the right guess, surely, though?
Yes, but itâs amazing how easy it is to either make completely correct guesses, or just pick a set of values that are more likely to be good guesses, and start by trying those, failing back to the âplod through the rest of the bytesâ approach when you need to.
Iâve coded an English-language frequency table into my padding oracle code, because that was appropriate for the challenge I was working on.
This code is available for you to review and use at https://github.com/alunmj/PaddingOracle/blob/master/PadOracle/Program.cs
You can imagine all kinds of ways to improve your guesses â when proceeding backward through a JSON object, for instance, a â}â character will be at the end; itâll be preceded by white space, double quotes, or brackets/braces, or maybe numerics. A 0x0a character will be preceded by a 0x0d (mostly), etc.
The other big performance improvement I made was to parallelise the search. You can work on one block entirely independently from another.
I chose to let the Parallel.For() function from C# decide exactly how it was going to split up work between different blocks, and the result is a whole lot faster. There are some wrinkles to manage when parallelising an algorithm, but Iâm not going to get into that here. This is not a programming blog, really!
I figured Iâd put that in big letters, because itâs worth calling out â the parallelisation alone obviously multiplies your performance by the number of cores youâve got (or the number of cores the web server has, if itâs underpowered), and the predictive work on the text does the rest. Obviously, the predictive approach only works if you can separate between âlikelyâ and âunlikelyâ characters â if the plaintext consists of random binary data, youâre not going to get much of a benefit. But most data is formatted, and/or is related to English/Latin text.
I havenât published the code for this part yet, but you can use this same breach to encrypt data without knowing the key.
This is really fun and simple once you get all the previous stuff. Here goes.
Encrypting a block requires the generation of two ciphertext blocks from one plaintext block. What the second block is, actually doesnât matter. We can literally set it to random data, or (which is important) specific data of our choosing.
The first block of the pair, acting like an IV, we can set to 0. Thereâs a reason for this which weâll come to in a minute.
With these two initial blocks, we run the decrypter. This will give us a âplaintextâ block as output. Remember how the intermediate block is the plaintext block XORed with the first of the pair of blocks? Well, because we set that first block to all zeroes, that means the plaintext block IS the same as the intermediate block. And that intermediate block was generated by decrypting the second block of the pair. In order for that decryption to result in the plaintext we want instead, we can simply take the intermediate block, XOR it with the plaintext block we want, and then put that into the first ciphertext block. [Weâre actually XORing this with the first ciphertext block, but thatâs a straight copy in this case, because the first ciphertext block is zeroes.]
Do the same thing for each of the rest of the blocks.
Sadly, thereâs no parallelising this approach, and the guessing doesnât help you either. You have to start with CN (randomly generated) and CN-1 (deduced with the approach above), then when youâve established what CN-1 is, you can use the same approach to get CN-2, and so on back to the IV (C0). So this process is just plain slow. But it allows you to encrypt an arbitrary set of data.
Leave a Reply