In case you missed it, on May 30th, a root certificate expired.
This made a lot of applications very unreliable, and been widely regarded as a bad move.
Well, alright, what was regarded as a bad move is that applications should become unreliable in the specific circumstances involved here.
When you connect to a server(web site or application) over SSL/TLS, the server has to send your client (browser or application) its Certificate.
In modern code, this Certificate is used by the client to trace back to a signing authority that is trusted by the client or its operating system.
Some servers like to help this process out, by sending a chain along with the Certificate for a couple of reasons:
This second situation is what weâre interested in here. A new root appears, new certificates are issued, and old clients refuse to honour them because they donât have the new root in their trust store.
This is fixed with âcross-signingâ, which allows an older, trusted root, to sign the new untrusted root, so that the older client sees a chain that includes the older root at the top, and is therefore trusted.
Older root certificates expire. It takes 20 years, but it finally happened at the end of May, to this one root certificate, âAddTrust External CA Rootâ
When that happens, a client who builds the certificate chain and uses this to trust the root certificate is happy, because it sees only certificates that it trusts.
A client who takes the certificate chain as supplied by the server, without building its own, will see that the chain ends in an expired certificate, and refuse to connect, because the entire chain cannot be trusted.
The two links I provided earlier are well worth a read if youâre interested in solving this problem, and really, Iâve got nothing to add to how this issue occurred, why itâs a problem, how to address it at your server, or any of those fun things.
What I do offer is a tool for .NET (Windows and Linux, Mac, etc) that lets you compare the certificate chain as presented by the server against the certificate chain built by a client. It will report if a certificate in either chain has expired. Itâs written in C#, and built with Visual Studio, and takes one parameter â the site to which it will connect on port 443 to query for the certificate and chain.
Itâs not a very smart tool, and it makes a few assumptions (though itâs relatively easy to fix if those assumptions turn out to be false).
But it has source code, and it runs on Windows, Linux and (presumably â havenât tested) Mac.
Working against the sites listed at http://testsites.test.certificatetest.com/, we get the following results:
First: https://aaacertificateservices.test.certificatetest.com/ – Certificate issued from a CA signed by AAA Certificate Services root.
Interestingly, note that the certificate chain in the stream from the server doesnât include the root certificate at all, but itâs present in the code where we ask the client code what certificates are in the chain for this server.
Second: https://addtrustexternalcaroot.test.certificatetest.com/ – Certificate issued from a CA signed by AddTrust External CA Root.
The certificates here expired on 5/30/2020, and itâs no surprise that we see this result in both the chain provided by the server and the chain provided by the client. Again, the root certificate isnât actually in the chain from the server provided in the stream.
Third: https://addtrustaia.test.certificatetest.com/ – Certificate issued from a CA signed by USERTrust RSA Certification Authority with a cross cert via AIA from AddTrust External CA Root.
Nothing noteworthy here, but itâs included here for completeness. I donât do anything in this code for an AIA cross cert.
Fourth, and most importantly: https://addtrustchain.test.certificatetest.com/ – Certificate issued from a CA signed by USERTrust RSA Certification Authority with a cross cert via server chain from AddTrust External CA Root.
Hereâs the point of the tool â itâs able to tell you that thereâs a certificate in the chain from the server that has expired, and may potentially be causing problems to visitors using an older browser or client library.
By now, youâve had enough of reading and you want to see the code â or just run it. Iâve attached two files â one for the source code, the other for the executable content. I leave it up to others to tell you how to install dotnet core on your platform.
Let me know if, and how, you use this tool, and whether it achieves whatever goal you want from it.
âWhatâs the point,â pondered Alice, âOf getting other people to stuff things in a box, if one cannot ever get them out?â
Ok, she never did say that, but itâs the sort of thing Alice would wonder.
Particularly if she noticed how often modern businesses send around Word forms with input fields designed to be filled out by team members, only to then be manually copied into spreadsheets, databases, or other documents.
Iâd put this as the second most irritating waste of document functionality.
And it doesnât have to be this way.
FIrst, letâs look at what you get with a Word form. There really isnât anything quite as specific a beast as a Word form. Itâs just a Word document. With form fields. Form fields are places into which users can type text, check boxes, select from drop down lists, etc.
Once form fields have been put into a document, the original document author can ârestrictâ the document such that only editing the form fields is allowed. This is usually done with a password, to make it less likely that others will edit the document beyond the form fields.
The presence of a password should not be taken to indicate that this is a security measure.
Removing the restriction can be done by guessing the password, or accessing the settings.xml inside the docx file, and changing the value of âw:enforcementâ from â1â to â0â. Other methods include saving to RTF, then editing the file in a text editor before saving it as docx again.
Restricting the document is done to make it less likely that blithe nonces will return your document to you with changes that are outside of the fields youâve provided to them, or with fields removed. This is important, because you canât as easily extract data from a document if you donât know where it is.
Hereâs what a form looks like when itâs restricted for editing, and has a number of form field elements provided â Iâve given a text field for a personâs name, a drop-down list for their zodiac sign, and a check box for education level. This is the sort of thing you might expect a form to be really useful for collecting.
Now that youâve sent this out to a hundred recipients, though, you want to extract the data from each form.
First weâve got to get the part of the document containing the data out. Knowing, as we do, that a docx file is just a ZIP file full of XML files, we could unzip it and go searching for the data. Iâve already done that â the data is in the file called âword/document.xmlâ. You could just rename the docx file to a zip file, open it in Explorer, navigate into the âwordâ folder, and then drag the document.xml file out for handling, but thatâs cumbersome, and we want an eventual automated solution.
Yeah, you could write this in a batch file using whatever ZIP program youâve downloaded, it wouldnât be that difficult, but Iâm thinking about PowerShell a lot these days for my automation. Hereâs code that will take a docx file and extract just the word/document.xml component into an output file whose name is provided.
# Load up the types required to handle a zip file.
Add-Type -AssemblyName System.IO.Compression.FilesystemFunction Get-DocXDocFile ($infilename, $outfilename){
$infileloc = [System.IO.Path]::Combine($pwd,$infilename)
$zip = [System.IO.Compression.ZipFile]::OpenRead($infileloc)
$zip.Entries | where { $_.FullName -eq “word/document.xml” } | foreach {
$outfileloc = [System.IO.Path]::Combine($pwd,$outfilename)
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($_, “$outfileloc”,$true)
}
}
By now, if youâre like me, youâve opened up that XML file and looked into it, and decided you donât care that much to read its entrails.
Thatâs OK, I did it for you.
The new-style fields are all in âw:sdtâ elements, and can be found by the âw:tagâ name under the âw:sdtPrâ element.
Old-style fields are all in âw:fldCharâ elements, and can be found by the âw:nameâ value under the âw:ffDataâ element.
In XPath, a way of describing how you find a specific element / attribute in an XML file, thatâs expressed as follows:
//w:sdt/w:sdtPr/w:tag[@w:val=’Txt1Tag’]/../..
//w:fldChar/w:ffData/w:name[@w:val=’Text1′]/../..
This does assume that you gave each of your fields names or tags. But it would be madness to expect data out if you arenât naming your fields.
If youâre handy with .NET programming, youâre probably half way done writing the code to parse this using XmlDocument.
If youâre not handy with .NET programming, you might need something a little (but sadly, not a lot) easier.
Remember those XPath elements? Wouldnât it be really cool if we could embed those into a document, and then have that document automatically expand them into their contents, so we could do that for every form file weâve got?
Well, we can.
Short for Extensible Stylesheet Language Transformation (which is definitely long enough to need something to be short for it), XSLT, which really has no good pronunciation because Iâm never going to say something that sounds like âex-slutâ at work, XSLT is a way to turn one XML-formatted document into some kind of output.
Letâs say weâre working with the document I outlined above (and which I will forget to attach to this blog post until someone points it out). Weâve already extracted document.xml, and with the right XSL file, and a suitable XSLT command (such as the Microsoft msxml tool, or whatever works in your native environment), we can do something like this:
Maybe instead of text, you prefer something more like CSV:
I will probably forget to attach the XSL stylesheets that I used for these two transformations to this blog post.
Maybe next time we can see about building this into a toolâŠ
Here’s the files I forgot to add: ExtractData
Just a quick note, because Iâve been sick this week, but last weekend, I put a little more work into my Padding Oracle exploit tool.
You can find the new code up at https://github.com/alunmj/PaddingOracle, and because of all the refactoring, itâs going to look like a completely new batch of code. But I promise that most of it is just moving code from Program.cs into classes, and adding parsing of command-line arguments.
I donât pretend to be the worldâs greatest programmer by any stretch, so if you can tell me a better way to do what Iâve done here, do let me know, and Iâll make changes and post something about them here.
Also, please let me know if you use the tool, and how well it worked (or didn’t!) for you.
The arguments currently supported are:
The only parameter unadorned with an option letter â this is the URL for the resource the Padding Oracle code will be pounding to test guesses at the encrypted code.
Also, âcipher. This provides a .NET regular expression which matches the ciphertext in the URL.
Also, âtextencoding, âencoding. This sets the encoding thatâs used to specify the ciphertext (and IV) in the URL. The default is b64
Also, âiv. This provides a .NET regular expression which matches the IV in the URL if itâs not part of the ciphertext.
Also, âblocksize. This sets the block size in bytes for the encryption algorithm. It defaults to 16, but should work for values up to 32.
Also, âverbose. Verbose â output information about the packets weâre decrypting, and statistics on speed at the end.
Also, âhelp. Outputs a brief help message
Also âparallelism. Dictates how much to parallelise. Specifying â1â means to use one thread, which can be useful to see whatâs going on. â1 means âmaximum parallelisationâ â as many threads as possible. Any other integer is roughly akin to saying âno more than this number of threadsâ, but may be overridden by other aspects of the Windows OS. The default is â1.
Instead of decrypting, this will encrypt the provided text, and provide a URL in return that will be decrypted by the endpoint to match your provided text.
These examples are run against the WebAPI project thatâs included in the PadOracle solution.
Letâs say youâve got an example URL like this:
http://localhost:31140/api/encrypted/submit?iv=WnfvRLbKsbYufMWXnOXy2Q%3d%3d&ciphertext=087gbLKbFeRcyPUR2tCTajMQAeVp0r50g07%2bLKh7zSyt%2fs3mHO96JYTlgCWsEjutmrexAV5HFyontkMcbNLciPr51LYPY%2f%2bfhB9TghbR9kZQ2nQBmnStr%2bhI32tPpaT6Jl9IHjOtVwI18riyRuWMLDn6sBPWMAoxQi6vKcnrFNLkuIPLe0RU63vd6Up9XlozU529v5Z8Kqdz2NPBvfYfCQ%3d%3d
This strongly suggests (because who would use âivâ and âciphertextâ to mean anything other than the initialisation vector and cipher text?) that you have an IV and a ciphertext, separate from one another. We have the IV, so letâs use it â hereâs the command line Iâd try:
PadOracle "http://localhost:31140/api/encrypted/submit?iv=WnfvRLbKsbYufMWXnOXy2Q%3d%3d&ciphertext=087gbLKbFeRcyPUR2tCTajMQAeVp0r50g07%2bLKh7zSyt%2fs3mHO96JYTlgCWsEjutmrexAV5HFyontkMcbNLciPr51LYPY%2f%2bfhB9TghbR9kZQ2nQBmnStr%2bhI32tPpaT6Jl9IHjOtVwI18riyRuWMLDn6sBPWMAoxQi6vKcnrFNLkuIPLe0RU63vd6Up9XlozU529v5Z8Kqdz2NPBvfYfCQ%3d%3d" -c "087gb.*%3d%3d" âi "WnfvRL.*2Q%3d%3d"
This is the result of running that command:
Notes:
Same URL, but this time I want to encrypt some text.
Our command line this time is:
PadOracle "http://localhost:31140/api/encrypted/submit?iv=WnfvRLbKsbYufMWXnOXy2Q%3d%3d&ciphertext=087gbLKbFeRcyPUR2tCTajMQAeVp0r50g07%2bLKh7zSyt%2fs3mHO96JYTlgCWsEjutmrexAV5HFyontkMcbNLciPr51LYPY%2f%2bfhB9TghbR9kZQ2nQBmnStr%2bhI32tPpaT6Jl9IHjOtVwI18riyRuWMLDn6sBPWMAoxQi6vKcnrFNLkuIPLe0RU63vd6Up9XlozU529v5Z8Kqdz2NPBvfYfCQ%3d%3d" -c "087gb.*%3d%3d" âi "WnfvRL.*2Q%3d%3d" âe "Hereâs some text I want to encrypt"
When we run this, it warns us itâs going to take a very long time, and boy itâs not kidding â we donât get any benefit from the frequency table, and we canât parallelise the work.
And you can see it took about two hours.
Last time, I wrote about how Iâd decided to write a padding oracle exploit tool from scratch, as part of a CTF, and so that I could learn a thing or two. I promised Iâd tell you how I made it faster⊠but first, a question.
One question Iâve had from colleagues is âwhy didnât you just run PadBuster?â
Itâs a great question, and in general, you should always think first about whether thereâs an existing tool that will get the job done quickly and easily.
Having said that, it took me longer to install PadBuster and the various language components it required than it did to open Visual Studio and write the couple of hundred lines of C# that I used to solve this challenge.
So, from a time perspective, at least, I saved time by doing it myself â and this came as something of a surprise to me.
The time it used up was my normally non-productive time, while Iâm riding the bus into Seattle with spotty-to-zero network connectivity (thereâs none on the bus, and my T-Mobile hot-spot is useful, but neither fast nor reliable down the I-5 corridor). This is time I generally use to tweet, or to listen to the BBC.
I just plain found it interesting to take what I thought I knew about padding oracles, and demonstrate that I had it solidly in my head.
Thatâs a benefit that really canât be effectively priced.
Plus, I learned a few things doing it myself:
- Parallelisation in C# is easier than it used to be.
- Thereâs not much getting around string conversions in trying to speed up the construction of a base64-encoded URL, but then again, when executing against a crypto back-end, thatâs not your bottleneck.
- Comments and blank lines are still important, especially if youâre going to explain the code to someone else.
The other thing that comes with writing your own code is that itâs easier to adjust it for performance â you know where the bottlenecks might lie, and you can dive in and change them without as much of a worry that youâre going to kill the function of the code. Because you know at a slightly more intuitive level how it all works.
You can obviously achieve that intuitive level over time with other peopleâs code, but I wasnât really going to enjoy that.
Looking at some of the chat comments directed at the PadBuster author, itâs clear that other people have tried to suggest optimisations to him, but he believes them not to be possible.
Specifically, he doesnât see that itâs possible to use guesses as to the plaintextâs likely contents to figure out what values should be in the ciphertext. You just plug the values 0..255 into the N-1 ciphertext block until your padding error from the N block goes away, and then that value can be XORed with the padding value to get the intermediate value from the N block. Then the intermediate value gets XORed with the original ciphertext value from the N-1 block to give the original plaintext.
Letâs see how that works in the case of the last block â where weâre expecting to see some padding anyway. Letâs say our block size is 4. Hereâs what two of our ciphertext blocks might look like:
CN-1 | CN | ||||||
0xbe | 0x48 | 0x45 | 0x30 | 0x71 | 0x4f | 0xcc | 0x63 |
Pretty random, right? Yeah, those are actually random numbers, but theyâll work to illustrate how we work here.
We iterate through values of CN-1[3] from 0..255, until we get a response that indicates no padding errors.
0x30 comes back without any padding errors. Thatâs convenient. So, weâve sent âbe484530714fcc63â, and we know now that weâve got a padding byte correct. Buuut that isnât the only right padding byte, because this is the last block, which also has a valid padding byte.
In fact, we can see that 0x30 matches the original value of the CN-1 blockâs last byte, so thatâs not terribly useful. Our padding count has a good chance of not being 1, and weâre trying to find the value that will set it to 1.
Keep iterating, and we get 0x32, giving us a request that doesnât contain a padding exception. Two values. Which one made our padding byte 0x1, so we can use it to determine the intermediate value?
The only way we get two matches will be because the real plaintext ends in a padding count that isnât 0x1. One of those values corresponds to 0x1, the other corresponds to the padding count, which could be 0x2..0x4. [Because weâre using four byte blocks as an example â a real-life example might have a 16-byte block size, so the padding count could be up to 0x10]
The clue is in the original plaintext â 0x30 MUST be the value that corresponds to the original padding count, so 0x32 MUST correspond to 0x1.
[If the original padding count was 0x1, we would only find one value that matched, and that would be the original value in CN-1]
That means the Intermediate value is 0x32 XOR 0x1 = 0x33 â which means the plaintext value is 0x3 â thereâs three bytes of padding at the end of this block.
We can actually write down the values of the last three plaintext and intermediate blocks now:
CN-1 | CN | ||||||
0xbe | 0x48 | 0x45 | 0x30 | 0x71 | 0x4f | 0xcc | 0x63 |
IN | |||||||
?? | 0x4b | 0x46 | 0x33 | ||||
CâN-1 | PN | ||||||
?? | 0x4f | 0x42 | 0x37 | ?? | 0x3 | 0x3 | 0x3 |
Wow â thatâs easy! Howâd we do that? Really simple. We know the last padding must be three bytes of 0x3, so we write those down. Then the intermediate bytes must be the XOR of 0x3 with the value in the CN-1 block.
[I chose in the code, instead of just âwriting downâ the values for each of those bytes, to check each one as I did so, to make sure that things were working. This adds one round-trip for each byte of padding, which is a relatively low cost, compared to the rest of the process.]
Now, if we want to detect the next byte, we want to change the last three bytes of CN-1, so theyâll set the PN values to 0x4, and then iterate through the target byte until we get a lack of padding errors.
So, each new value of the last few bytes of CN-1 will be Câ[i] = C[i] XOR 0x3 XOR 0x4 â taking the value in the original, XORing it with the original plaintext, and then with the desired plaintext to get a new value for the ciphertext.
Iâve put those values of CâN-1 in the table above.
This trick doesnât just stop with the padding bytes, though. Iâm going to guess this is a JSON object, so itâs going to end with a â}â character (close-brace), which is 0x7d.
So, Câ = C XOR 0x7d XOR 0x4 = 0xbe XOR 0x7d XOR 0x4 = 0xc7.
Letâs try that â we now send âc74f4237â â no padding error!
A successful guess for the last four bytes of PN. Now we can fill in more of the table:
CN-1 | CN | ||||||
0xbe | 0x48 | 0x45 | 0x30 | 0x71 | 0x4f | 0xcc | 0x63 |
IN | |||||||
0xba | 0x4b | 0x46 | 0x33 | ||||
CâN-1 | PN | ||||||
0xc7 | 0x4f | 0x42 | 0x37 | 0x7d | 0x3 | 0x3 | 0x3 |
Awesome.
That does require me making the right guess, surely, though?
Yes, but itâs amazing how easy it is to either make completely correct guesses, or just pick a set of values that are more likely to be good guesses, and start by trying those, failing back to the âplod through the rest of the bytesâ approach when you need to.
Iâve coded an English-language frequency table into my padding oracle code, because that was appropriate for the challenge I was working on.
This code is available for you to review and use at https://github.com/alunmj/PaddingOracle/blob/master/PadOracle/Program.cs
You can imagine all kinds of ways to improve your guesses â when proceeding backward through a JSON object, for instance, a â}â character will be at the end; itâll be preceded by white space, double quotes, or brackets/braces, or maybe numerics. A 0x0a character will be preceded by a 0x0d (mostly), etc.
The other big performance improvement I made was to parallelise the search. You can work on one block entirely independently from another.
I chose to let the Parallel.For() function from C# decide exactly how it was going to split up work between different blocks, and the result is a whole lot faster. There are some wrinkles to manage when parallelising an algorithm, but Iâm not going to get into that here. This is not a programming blog, really!
I figured Iâd put that in big letters, because itâs worth calling out â the parallelisation alone obviously multiplies your performance by the number of cores youâve got (or the number of cores the web server has, if itâs underpowered), and the predictive work on the text does the rest. Obviously, the predictive approach only works if you can separate between âlikelyâ and âunlikelyâ characters â if the plaintext consists of random binary data, youâre not going to get much of a benefit. But most data is formatted, and/or is related to English/Latin text.
I havenât published the code for this part yet, but you can use this same breach to encrypt data without knowing the key.
This is really fun and simple once you get all the previous stuff. Here goes.
Encrypting a block requires the generation of two ciphertext blocks from one plaintext block. What the second block is, actually doesnât matter. We can literally set it to random data, or (which is important) specific data of our choosing.
The first block of the pair, acting like an IV, we can set to 0. Thereâs a reason for this which weâll come to in a minute.
With these two initial blocks, we run the decrypter. This will give us a âplaintextâ block as output. Remember how the intermediate block is the plaintext block XORed with the first of the pair of blocks? Well, because we set that first block to all zeroes, that means the plaintext block IS the same as the intermediate block. And that intermediate block was generated by decrypting the second block of the pair. In order for that decryption to result in the plaintext we want instead, we can simply take the intermediate block, XOR it with the plaintext block we want, and then put that into the first ciphertext block. [Weâre actually XORing this with the first ciphertext block, but thatâs a straight copy in this case, because the first ciphertext block is zeroes.]
Do the same thing for each of the rest of the blocks.
Sadly, thereâs no parallelising this approach, and the guessing doesnât help you either. You have to start with CN (randomly generated) and CN-1 (deduced with the approach above), then when youâve established what CN-1 is, you can use the same approach to get CN-2, and so on back to the IV (C0). So this process is just plain slow. But it allows you to encrypt an arbitrary set of data.
We did a CTF at work.
I have to say it was loads of fun â Iâve never really participated in a CTF before, so it was really good to test my hacking skills against my new colleagues.
We had one instruction from my manager â âdonât let the interns beat youâ. I was determined to make sure that didnât happen, but I was also determined to share as much knowledge and excitement for InfoSec as possible. This meant that once or twice, I may have egged an intern on in the face of the fact that they were about to discover it anyway, and it just seemed like a really good way to keep them interested.
This is not that story.
This is about me turning the corner from knowing about a security failure, to understanding how it works. Letâs see if I can help you guys understand, too.
Thatâs the title of my blog, and thereâs not a whole lot of cryptography here. Itâs just a play on words, which was a little more relevant when I first started the blog back in 2005. So hereâs some crypto, at last.
Thereâs several aspects to cryptography that you have to get right as a developer:
Having tried to teach all of these to developers in various forms, I can tell you that the first one, which should be the simplest, is still surprisingly hard for some developers to master. Harder still for managers â the number of breach notifications that talk about passwords being âencryptedâ is a clear sign of this â encrypted passwords mean either your developers donât understand and implemented the wrong thing, or your manager doesnât understand what the developer implemented and thinks âencrypted sounds better than hashedâ, and puts that down without checking that itâs still technically accurate.
Key creation (so itâs not predictable), and storage (so it canât be found by an attacker) is one of those issues that seems to go perennially unsolved â Iâm not happy with many of the solutions Iâve seen, especially for self-hosted services where you canât just appeal to a central key vault such as is currently available in all good cloud platforms.
Picking correct algorithms is a moving target. Algorithms that were considered perfectly sound ten or twenty years ago are now much weaker, and can result in applications being much weaker if they arenât updated to match new understanding of cryptography, and processor and memory speed and quantity improvements. You can store rainbow tables in memory now that were unthinkable on disk just a decade or two ago.
Finally, of course, if all that wasnât enough to make cryptography sound really difficult (spoiler: it is, which is why you get someone else to do it for you), there are a number of ways in which you can mess up the way in which you use the algorithm.
There are a large number of parameters to set even when youâve picked which algorithms youâre using. Key sizes, block sizes, are fairly obvious â larger is (generally) better for a particular algorithm. [There are exceptions, but itâs a good rule of thumb to start from.]
There are a number of different modes available, generally abbreviated to puzzling TLAs â ECB, CFB, OFB, CBC, GCM, CTR, and so on and so forth. Itâs bewildering. Each of these modes just defines a different order in which to apply various operations to do things like propagating entropy, so that itâs not possible to infer anything about the original plaintext from the ciphertext. Thatâs the idea, at least. ECB, for instance, fails on this because any two blocks of plaintext that are the same will result in two blocks of ciphertext that are the same.
And if youâre encrypting using a block cipher, you have to think about what to do with the last block â which may not be a complete block. This requires that the block be filled out with âpaddingâ to make a full block. Even if youâre just filling it out with zeroes, youâre still padding â and those zeroes are the padding. (And you have to then answer the question âwhat if the last block ended with a zero before you padded it?â)
Thereâs a number of different padding schemes to choose from, too, such as âbit paddingâ, where after the last bit, you set the next bit to 1, and the remaining bits in the block to 0. Or thereâs padding where the last byte is set to the count of how many padding bytes there are, and the remaining bytes are set to 0 â or a set of random bytes â or the count repeated over and over. Itâs this latter that is embodied as PKCS#5 or PKCS#7 padding. For the purposes of this discussion, PKCS#7 padding is a generalised version of PKCS#5 padding. PKCS#5 padding works on eight-byte blocks, and PKCS#7 padding works on any size blocks (up to 256 bytes, presumably).
So, if you have a three-byte last block, and the block size is 16 bytes, the last block is ** ** ** 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d 0x0d (where â**â represents the last three bytes of data, and 0x0d represents the hexadecimal value for 13, the number of bytes in the padding). If your last block is full, PKCS#7 covers this by making you create an extra 16-byte block, with the value 0x10 (decimal 16) in every byte.
Itâs not at all unlikely that you wind up with the scenario with which we were presented in the CTF â a service that communicated with AES encryption, in CBC mode, using PKCS#7 padding. The fact that this was described as such was what tipped me off in the first place. This is the perfect setup for a Padding Oracle attack.
An Oracle is simply a device/function/machine/system/person that you send a request to, and get a response back, and which gives you some information as a result. The classical Oracles of Ancient Greece & Roman times were confusing and unhelpful at best, and thatâs really something we want from any cryptographic oracle. The term âRandom Oracleâ refers to a hypothetical system which returns random information to every query. A good cryptographic system is one that is indistinguishable from a Random Oracle.
Sadly, CBC with PKCS#7 padding is generally very different from a Random Oracle. It is a Padding Oracle, because it will tell us when the padding is correct or incorrect. And thatâs our vulnerability.
At this point, I could have done what one of my colleagues did, and download PadBuster, choosing parameters and/or modifying code, to crack the encryption.
But⊠Iâve been attacking this CTF somewhat ⊠non-traditionally, using tools other than the normal ones, and so I thought Iâd try and understand the algorithmâs weaknesses, and implement my own attack. I wrote it on the bus on my way into work, and was pleased to see when I got in that it worked â albeit slowly â first time.
When decrypting each block using CBC, we say that PN = DK(CN)âCN-1 â which is just a symbolic way of saying that the recipient Decrypts (with key âKâ) the current Ciphertext block (block N), and then XORs the result with the previous Ciphertext block (the N-1th block). Letâs also assume that weâre only decrypting those two blocks, N-1 and N, with N being the last block provided to the recipient.
In other modes, the padding check may not deliver the helpful information weâre looking for, but CBC is special. The way CBC decrypts data is to decrypt the current block of ciphertext (CN), which creates an intermediate block DK(CN). That intermediate block is combined with the previous ciphertext block, CN-1, to give the plaintext block, PN. This combining of blocks is done using the XOR (exclusive-or) operation, which has interesting properties any developer should be familiar with. Particularly, itâs important to note that XOR (represented here as âââ) is reversible. If XâY=Z, you know also that ZâY=X and ZâX=Y. This is one of the reasons the XOR operation is used in a lot of cryptographic algorithms.
If we want to change things in the inputs to produce a different output, we can really only change two things â the current and the previous block of Ciphertext â CN and CN-1. We should really only alter one input at a time. If we alter CN, thatâs going to be decrypted, and a small change will be magnified into a big difference to the DK(CN) value â all the bytes will have changed. But if we alter CN-1, just a bit, what we wind up with is a change in the plaintext value PN which matches that change. If we alter the 23rd bit of CN-1, it will alter the 23rd bit of PN, and only that one bit. Now if we can find what weâve changed that bit to, we can then figure out what that means we must have changed it from.
If we change the last byte of CN-1, to create CâN-1 (pronounced âC prime of N minus 1â) and cycle it through all the possible values it can take, the decryption will occur, and the recipient will reject our new plain text, PâN (âP prime of Nâ) because it is poorly formed â it will have a bad padding. With one (two, but Iâll come to that in a minute) notable exception. If the last byte of the plaintext decrypted is the value 0x01, itâs a single byte of padding â and itâs correct padding. For that value of the last byte of CâN-1, we know that the last byte of PâN is 1. We can rewrite PN = DK(CN)âCN-1 as DK(CN) = CN-1âPN â and then we can put the values in for the last byte: DK(CN)[15] = CâN-1[15]â0x01.
Letâs say, for illustrationâs sake, that the value we put in that last byte of CâN-1 was 0xa5, when our padding was accepted. That means DK(CN)[15] = 0xa5 â 0x01 = 0xa4. Note the lack of any âprimeâ marks there â weâve figured out what the original value of the decrypted last byte was. Note that this isnât the same as the last byte of the plain text. No, we get that by taking this new value and XORing it with the original last byte of the previous block of ciphertext â thatâs CN-1[15]. For illustration, letâs say that value is 0xc5. We calculate PN[15] = DK(CN)[15]âCN-1[15] = 0xa4â0xc5 = 0x61. Thatâs the lower case letter âaâ.
OK, so we got the first piece of plaintext out â the last byte.
[Remember that I said Iâd touch on another case? If CN is the original last block of ciphertext, it already contains valid padding! But not necessarily the 0x01 weâre trying to force into place.]
Almost the same process is used to get the next byte, with a couple of wrinkles. First, obviously, weâre altering the second-to-last byte through all possible values. Second, and not quite so obvious, we have to tweak the last byte once as well, because weâre looking to get the sequence 0x02 0x02 (two twos) to happen at the end of PâN. The last byte of CâN-1 to achieve this is simply the last byte of CâN-1 that we used to get 0x01, XORed by 0x03 (because thatâs 0x02 â 0x01). In our illustrative example, thatâs 0xa6.
Each time, you have to set the end values of the ciphertext block, so that the end of PâN will look like 0x03 0x03 0x03, 0x04 0x04 0x04 0x04, etc, all the way up to 0x10 ⊠0x10 (sixteen 16s).
So hereâs the 200 lines or so that I wrote on the bus. I also wrote a test harness so that this would work even after the CTF finished and got shut down. Youâll find that in the same repo.
Iâve massaged the code so itâs easier to understand, or to use as an explainer for whatâs going on.
I plan on expanding this in a couple of ways â first, to make it essentially command-line compatible with âPadBusterâ, and second, to produce a graphical demo of how the cracking happens.
And in the next post, Iâm going to talk a little about how I optimised this code, so that it was nearly 15x faster than PadBuster.
Iâve posted before how Iâd like to get my source code out of the version control system I used to use, because it was no longer supported by the manufacturer, and into something else.
I chose git, in large part because it uses an open format, and as such isnât going to suffer the same problem I had with ComponentSoftwareâs CS-RCS.
Now that Iâve figured out how to use Bash on Ubuntu on Windows to convert from CS-RCS to git, using the rcs-fast-export.rb script, Iâm also looking to protect my source control investment by storing it somewhere off-site.
This has a couple of good benefits â one is that Iâll have access to it when Iâm away from my home machine, another is that Iâll be able to handle catastrophic outages, and a third is that Iâll be able to share more easily with co-conspirators.
Iâm going to use Visual Studio Team Services (VSTS), formerly known as Visual Studio Online, previous to that, as Team Foundation Services Online. You can install VSTS on your own server, or you can use the online tool at <yourdomain>.visualstudio.com. If your team is smaller than five people, you can do this for free, just like you can use Visual Studio 2015 Community Edition for free. This is a great way in which Microsoft supports hobbyist developers, open source projects, college students, etc.
After my last post on the topic, you have used git and rcs-fast-export.rb to create a Git repository.
You may even have done a âgit checkoutâ command to get the source code into a place where you can work on it. Thatâs not necessary for our synchronisation to VSTS, because weâre going to sync the entire repository. This will work whether you are using the Bash shell or the regular Command Prompt, as long as you have git installed and in your PATH.
If youâve actually made any changes, be sure to add and commit them to the local Git repository. We donât want to lose those!
Iâm also going to assume you have a VSTS account. First, visit the home page.
Under âRecent Projects & Teamsâ, click âNewâ.
Give it a name and a description â I suggest leaving the other settings at their default of âAgileâ and âGitâ unless you have reason to change. The setting of âGitâ in particular is required if youâre following along, because thatâs how weâre going to synchronise next.
When you click âCreate projectâ, itâll think for a whileâŠ
And then youâll have the ability to continue on. Not sure my teamâs actually âgoing to love thisâ, considering itâs just me!
Yes, itâs not just your eyes, the whole dialog moved down the screen, so you canât hover over the button waiting to hit it.
Click âNavigate to projectâ, and youâll discover that thereâs a lot waiting for you. Fortunately a quick popup gives you the two most likely choices youâll have for any new project.
As my team-mates will attest, I donât do Kanban very well, so weâll ignore that side of things, Iâm mostly using this just to track my source code. So, hit âAdd Codeâ, and you get this:
Donât choose any yet
âClone to your computerâ â an odd choice of the direction to use, since this is an empty source directory. But, since it has a âClone in Visual Studioâ button, this may be an easy way to go if you already have a Visual Studio project working with Git that you want to tie into this. There is a problem with this, however, in that if youâre working with multiple versions of Visual Studio, note that any attempt from VSTS to open Visual Studio will only open the most recently installed version of Visual Studio. I found no way to make Visual Studio 2013 automatically open from the web for Visual Studio 2013 projects, although the Visual Studio Version Selector will make the right choice if you double click the SLN file.
âPush an existing repository from command lineâ â this is what I used. A simple press of the âCopy to clipboardâ button gives me the right commands to feed to my command shell. You should run these commands from somewhere in your workspace, I would suggest from the root of the workspace, so you can check to see that you have a .git folder to import before you run the commands.
BUT â I would strongly recommend not dismissing this screen while you run these commands, you canât come back to it later, and youâll want to add a .gitignore file.
The other options are:
âImport a repositoryâ â this is if youâre already hosting your git repository on some other web site (like Github, etc), and want to make a copy here. This isnât a place for uploading a fast-import file, sadly, or we could shortcut the git process locally. (Hey, Microsoft, you missed a trick!)
âInitialize with a README or gitignoreâ â a useful couple of things to do. A README.md file is associated with git projects, and instructs newcomers to the project about it â how to build it, what itâs for, where to find documentation, etc, etc â and you can add this at any time. The .gitignore file tells git what file names and extensions to not bother with putting into. Object files, executables, temporary files, machine generated code, PCH & PDB files, etc, etc. You can see the list is long, and thereâs no way to add a .gitignore file with a single button click after youâve left this page. You can steal one from an empty project, by simply copying it â but the button press is easier.
Iâve found it useful to run the âgit remoteâ and âgit pushâ commands from the command-line (and I choose to run them from the Bash window, because Iâm already there after running the RCS export), and then add the .gitignore. So, I copy the commands and send them to the shell window, before I press the âAdd a .gitignoreâ button, choose âVisual Studioâ as my gitignore type, and then select âInitializeâ:
First, letâs start with a recap of using the rcs-fast-export command to bring the code over from the old RCS to a new Git repository:
Commands in that window:
Commands:
No commands â weâve imported and are ready to sync up to the VSTS server.
Commands (copied from the âAdd Codeâ window):
Your solution still has lines in it dictating what version control youâre using. So you want to unbind that.
[If you donât unbind existing version control, you wonât be able to use the built-in version control features in Visual Studio, and youâll keep getting warnings from your old version control software. When you uninstall your old version control software, Visual Studio will refuse to load your projects. So, unbinding your old version control is really important!]
I like to do that in a different directory from the original, for two reasons:
So, now itâs Command Prompt window timeâŠ
Yes, you could do that from Visual Studio, but itâs just as easy from the command line. Note that I didnât actually enter credentials here â theyâre cached by Windows.
Commands entered in that window:
Your version control system may complain when opening this project that itâs not in the place it remembers being in⊠I know mine does. Tell it thatâs OK.
[Yes, Iâve changed projects, from Juggler to EFSExt. I suddenly realised that Juggler is for Visual Studio 2010, which is old, and not installed on this system.]
Now that weâve opened the solution in Visual Studio, itâs time to unbind the old source control. This is done by visiting the File => Source Control => Change Source Control menu option:
Youâll get a dialog that lists every project in this solution. You need to select every project that has a check-mark in the âConnectedâ column, and click the âUnbindâ button.
Luckily, in this case, theyâre already selected for me, and I just have to click âUnbindâ:
You are warned:
Note that this unbinding happens in the local copy of the SLN and VCPROJ, etc files â itâs not actually going to make any changes to your version control. [But you made a backup anyway, because youâre cautious, right?]
Click âUnbindâ and the dialog changes:
Click OK, and weâre nearly thereâŠ
Finally, we have to sync this up to the Git server. And to do that, we have to change the Source Control option (which was set when we first loaded the project) to Git.
This is under Tools => Options => Source Control. Select the âMicrosoft Git Providerâ (or in Visual Studio 2015, simply âGitâ):
Press âOKâ. Youâll be warned if your solution is still bound in some part to a previous version control system. This can happen in particular if you have a project which didnât load, but which is part of this solution. Iâm not addressing here what you have to do for that, because it involves editing your project files by hand, or removing projects from the solution. You should decide for yourself which of those steps carries the least risk of losing something important. Remember that you still have your files and their history in at least THREE version control systems at this point â your old version control, the VSTS system, and the local Git repository. So even if you screw this up, thereâs little real risk.
Now that you have Git selected as your solution provider, youâll see that the âChangesâ option is now available in the Team Explorer window:
Save all the files (but I donât have any open!) by pressing Ctrl-Shift-S, or selecting File => Save All.
If you skip this step, there will be no changes to commit, and you will be confused.
Select âChangesâ, and youâll see that the SLN files and VCPROJ files have been changed. You can preview these changes, but they basically are to do with removing the old version control from the projects and solution.
It wants a commit message. This should be short and explanatory. I like âRemoved references to old version control from solutionâ. Once youâve entered a commit message, the Commit button is available. Click it.
It now prompts you to Sync to the server.
So click the highlighted word, âSyncâ, to see all the unsynced commits â you should only have one at this point, but as you can imagine, if you make several commits before syncing, these can pile up.
Press the âSyncâ button to send the commit up to the server. This is also how you should usually get changes others have made to the code on the server. Note that âothersâ could simply mean âyou, from a different computer or repositoryâ.
Check on the server that the history on the branch now mentions this commit, so that you know your syncing works well.
Sure, it seems like a long-winded process, but most of what Iâve included here is pictures of me doing stuff, and the stuff Iâm doing is only done once, when you create the repository and populate it from another. Once itâs in VSTS, I recommend building your solution, to make sure it still builds. Run whatever tests you have to make sure that you didnât break the build. Make sure that you still have valid history on all your files, especially binary files. If you donât have valid history on any files in particular, check the original version control, to see if you ever did have. I found that my old CS-RCS implementation was storing .bmp files as text, so the current version was always fine, but the history was corrupted. Thatâs history I canât retrieve, even with the new source control.
Now, what about those temporary repositories? Git makes things really easy â the Git repository is in a directory off the root of the workspace, called â.gitâ. Itâs hidden, but if you want to delete the repository, just delete the â.gitâ folder and its contents. You can delete any temporary workspaces the same way, of course.
I did spend a little time automating the conversion of multiple repositories to Git, but that was rather ad-hoc and wobbly, so Iâm not posting it here. Iâd love to think that some of the rest of this could be automated, but I have only a few projects, so it was good to do by hand.
No programmer should be running an unsupported, unpatched, unupdated old version control system. Thatâs risky, not just from a security perspective, but from the perspective that it may screw up your files, as you vary the sort of projects you build.
No programmer should be required to drop their history when moving to a new version control system. There is always a way to move your history. Maybe that way is to hire a grunt developer to fetch versions dated at random/significant dates throughout history out of the old version control system, and check them in to the new version control system. Maybe you can write automation around that. Or maybe youâll be lucky and find that someone else has already done the automation work for you.
Hopefully Iâve inspired you to take the plunge of moving to a new version control system, and youâve successfully managed to bring all your precious code history with you. By using Visual Studio Team Services, youâve also got a place to track features and bugs, and collaborate with other members of a development team, if thatâs what you choose to do. Because youâve chosen Git, you can separate the code and history at any time from the issue tracking systems, should you choose to do so.
Let me know how (if?) it worked for you!
In which I move my version control from ComponentSoftwareâs CS-RCS Pro to Git while preserving commit history.
[If you donât want the back story, click here for the instructions!]
OK, so having watched the video I linked to earlier, I thought Iâd move some of my old projects to Git.
I picked one at random, and went looking for tools.
Iâm hampered a little by the fact that all my old projects used ComponentSoftwareâs âCS-RCS Proâ.
A couple of really good reasons:
But you know who doesnât use CS-RCS Pro any more?
Thatâs right, ComponentSoftware.
Itâs a dead platform, unsupported, unpatched, and belongs off my systems.
One simple reason â if I move off the platform, I face the usual choice when migrating from one version control system to another:
The second option seems a bit of a waste to me.
OK, so yes, technically I could mix the two modes, by using CS-RCS Pro to browse the ancient history when I need to, and Git to browse recent history, after starting Git from a clean working folder. But I could see a couple of problems:
So, really, I wanted to make sure that I could move my files, history and all.
I really didnât have a good way to do it.
Clearly, any version control system can be moved to any other version control system by the simple expedient of:
But, as you can imagine, thatâs really long-winded and manual. That should be automatable.
In fact, given the shared APIs of VSS-compatible source control services, Iâm truly surprised that nobody has yet written a tool to do basically this task. Iâd get on it myself, but I have other things to do. Maybe someone will write a âVSS2Gitâ or âVSS2VSSâ toolkit to do just this.
There is a format for creating a single-file copy of a Git repository, which Git can process using the command âgit fast-importâ. So all I have to find is a tool that goes from a CS-RCS repository to the fast-import file format.
So, clearly thereâs no tool to go from CS-RCS Pro to Git. Thereâs a tool to go from CS-RCS Pro to CVS, or there was, but that was on the now-defunct CS-RCS web site.
But⊠Remember I said that itâs compatible with GNU RCS.
And thereâs scripts to go from GNU RCS to Git.
OK, so the script for this is written in Ruby, and as I read it, there seemed to be a few things that made it look like it might be for Linux only.
I really wasnât interested in making a Linux VM (easy though that may be) just so I could convert my data.
Everything changed with the arrival of the recent Windows 10 Anniversary Update, because along with it came a new component.
Bash on Ubuntu on Windows.
Itâs like a Linux VM, without needing a VM, without having to install Linux, and it works really well.
With this, I could get all the tools I needed â GNU RCS, in case I needed it; Ruby; Git command line â and then I could try this out for myself.
Of course, I wouldnât be publishing this if it wasnât somewhat successful. But there are some caveats, OK?
Iâve tried this a few times, on ONE of my own projects. This isnât robustly tested, so if something goes all wrong, please by all means share, and people who are interested (maybe me) will probably offer suggestions, some of them useful. Iâm not remotely warrantying this or suggesting itâs perfect. It may wipe your development history out of your one and only copy of version control⊠so donât do it on your one and only copy. Make a backup first.
GNU RCS likes to store files in one of two places â either in the same directory as the working files, but with a â,vâ pseudo-extension added to the filename, or in a sub-directory off each working folder, called âRCSâ and with the same â,vâ extension on the files. If you did either of these things, thereâs no surprises. ButâŠ
CS-RCS Pro doesnât do this. It has a separate RCS Repository Root. I put mine in C:\RCS, but you may have yours somewhere else. Underneath that RCS Repository Root is a full tree of the drives youâve used CS-RCS to store (without the â:â), and a tree under that. I really hope you didnât embed anything too deep, because that might bode ill.
Initially, this seemed like a bad thing, but because you donât actually need the working files for this task, you can pretend that the RCS Repository is actually your working space.
Maybe this is obvious, but it took me a moment of thinking to decide I didnât have to move files into RCS sub-folders of my working directories.
Make this a âflag dayâ. After you do this conversion, never use CS-RCS Pro again. It was good, and it did the job, and itâs now buried in the garden next to Old Yeller. Do not sprinkle the zombification water on that hallowed ground to revive it.
This also means you MUST check in all your code before converting, because checking it in afterwards will be ⊠difficult.
Assumption: You have Windows 10.
This might look like a lot of instructions, but I mostly just wanted to be clear. This is really quick work. If you screw up after the âgit initâ command, simply ârm ârf .gitâ to remove the new repository.
I hate when people ask me this question, because I inevitably respond with a half-dozen questions of my own, which makes me seem like a bit of an arse.
To reduce that feeling, because the questions donât seem to be going away any time soon, I thought Iâd write some thoughts out.
Passwords are important objects â and because people naturally share IDs and passwords across multiple services, your holding on to a customerâs / userâs password means you are a necessary part of that userâs web of credential storage.
It will be a monumental news story when your password database gets disclosed or leaked, and even more of a story if youâve chosen a bad way of protecting that data. You will lose customers and you will lose business; you may even lose your whole business.
Take a long hard look at what youâre doing, and whether you actually need to be in charge of that kind of risk.
If you are going to verify a user, you donât need encrypted passwords, you need hashed passwords. And those hashes must be salted. And the salt must be large and random. Iâll explain why some other time, but you should be able to find much documentation on this topic on the Internet. Specifically, you donât need to be able to decrypt the password from storage, you need to be able to recognise it when you are given it again. Better still, use an acknowledged good password hashing mechanism like PBKDF2. (Note, from the â2â that it may be necessary to update this if my advice is more than a few months old)
Now, do not read the rest of this section â skip to the next question.
Seriously, what are you doing reading this bit? Go to the heading with the next question. You donât need to read the next bit.
<sigh/>
OK, if you are determined that you will have to impersonate a user (or a service account), you might actually need to store the password in a decryptable form.
First make sure you absolutely need to do this, because there are many other ways to impersonate an incoming user using delegation, etc, which donât require you storing the password.
Explore delegation first.
Finally, if you really have to store the password in an encrypted form, you have to do it incredibly securely. Make sure the key is stored separately from the encrypted passwords, and donât let your encryption be brute-forcible. A BAD way to encrypt would be to simply encrypt the password using your public key â sure, this means only you can decrypt it, but it means anyone can brute-force an encryption and compare it against the ciphertext.
A GOOD way to encrypt the password is to add some entropy and padding to it (so I canât tell how long the password was, and I canât tell if two users have the same password), and then encrypt it.
Password storage mechanisms such as keychains or password vaults will do this for you.
If you donât have keychains or password vaults, you can encrypt using a function like Windowsâ CryptProtectData, or its .NET equivalent, System.Security.Cryptography.ProtectedData.
[Caveat: CryptProtectData and ProtectedData use DPAPI, which requires careful management if you want it to work across multiple hosts. Read the API and test before deploying.]
[Keychains and password vaults often have the same sort of issue with moving the encrypted password from one machine to another.]
For .NET documentation on password vaults in Windows 8 and beyond, see: Windows.Security.Credentials.PasswordVault
For non-.NET on Windows from XP and later, see: CredWrite
For Apple, see documentation on Keychains
If youâre protecting data in a business, you can probably tell users how strong their passwords must be. Look for measures that correlate strongly with entropy â how long is the password, does it use characters from a wide range (or is it just the letter âaâ repeated over and over?), is it similar to any of the most common passwords, does it contain information that is obvious, such as the userâs ID, or the name of this site?
Maybe you can reward customers for longer passwords â even something as simple as a âstrong account awardâ sticker on their profile page can induce good behaviour.
Length is mathematically more important to password entropy than the range of characters. An eight character password chosen from 64 characters (less than three hundred trillion combinations â a number with 4 commas) is weaker than a 64 character password chosen from eight characters (a number of combinations with 19 commas in it).
An 8-character password taken from 64 possible characters is actually as strong as a password only twice as long and chosen from 8 characters â this means something like a complex password at 8 characters in length is as strong as the names of the notes in a couple of bars of your favourite tune.
Allowing users to use password safes of their own makes it easier for them to use longer and more complex passwords. This means allowing copy and paste into password fields, and where possible, integrating with any OS-standard password management schemes
Everything seems to default to sending a password reset email. This means your usersâ email address is equivalent to their credential. Is that strength of association truly warranted?
In the process to change my email address, you should ask me for my password first, or similarly strongly identify me.
What happens when I stop paying my ISP, and they give my email address to a new user? Will they have my account on your site now, too?
Every so often, maybe you should renew the relationship between account and email address â baselining â to ensure that the address still exists and still belongs to the right user.
Password hints push you dangerously into the realm of actually storing passwords. Those password hints must be encrypted as well as if they were the password themselves. This is because people use hints such as âThe password is âOompaloompahââ â so, if storing password hints, you must encrypt them as strongly as if you were encrypting the password itself. Because, much of the time, you are. And see the previous rule, which says you want to avoid doing that if at all possible.
How do you enforce occasional password changes, and why?
What happens when a user changes their password?
What happens when your password database is leaked?
What happens when you need to change hash algorithm?
Every so often, I write about some real-world problems in this blog, rather than just getting excited about generalities. This is one of those times.
I had a list of users the other day, exported from a partner with whom we do SSO, and which somehow had some duplicate entries in.
These were not duplicate in the sense of âexactly the same data in every fieldâ, but differed by email address, and sometimes last name. Those of you who manage identity databases will know exactly what Iâm dealing with here â people change their last name, through marriage, divorce, adoption, gender reassignment, whim or other reason, and instead of editing the existing entry, a new entry is somehow populated to the list of identities.
What hadnât changed was that each of these individuals still held their old email address in Active Directory, so all I had to do was look up each email address, relate it to a particular user, and then pull out the canonical email address for that user. [In this case, thatâs the first email address returned from AD]
A quick search on the interwebs gave me this as a suggested VBA function to do just that:
1: Function GetEmail(email as String) as String
2: ' Given one of this users' email addresses, find the canonical one.
3:
4: ' Find our default domain base to search from
5: Set objRootDSE = GetObject("LDAP://RootDSE")
6: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"
7:
8: ' Open a connection to AD
9: Set ADOConnection = CreateObject("ADODB.Connection")
10: ADOConnection.Provider = "ADsDSOObject"
11: ADOConnection.Open "Active Directory Provider"
12:
13: ' Create a command
14: Set ADCommand = CreateObject("ADODB.Command")
15: ADCommand.ActiveConnection = ADOConnection
16:
17: 'Find user based on their email address
18: ADCommand.CommandText = _
19: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
20: strBase & " WHERE objectCategory='user' and mail='" & email & "'"
21:
22: ' Execute this command
23: Set ADRecordSet = ADCommand.Execute
24:
25: ' Extract the canonical email address for this user.
26: GetEmail = ADRecordSet.Fields("Mail")
27:
28: ' Return.
29: End Function
That did the trick, and I stopped thinking about it. Printed out the source just to demonstrate to a couple of people that this is not rocket surgery.
Yesterday the printout caught my eye. Hereâs the particular line that made me stop:
18: ADCommand.CommandText = _
19: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
20: strBase & " WHERE objectCategory='user' AND mail='" & email & "'"
That looks like a SQL query, doesnât it?
Probably because it is.
Itâs one of two formats that can be used to query Active Directory, the other being the less-readable LDAP syntax.
Both formats have the same problem â when you build the query using string concatenation like this, itâs possible for the input to give you an injection by escaping from the data and into the code.
I checked this out â when I called this function as follows, I got the first email address in the list as a response:
1: Debug.Print GetEmail("x' OR mail='*")
You can see my previous SQL injection articles to come up with ideas of other things I can do now that Iâve got the ability to inject.
Normally, Iâd suggest developers use Parameterised Queries to solve this problem â and thatâs always the best idea, because it not only improves security, but it actually makes the query faster on subsequent runs, because itâs already optimised. Hereâs how that ought to look:
1: ADCommand.CommandText = _
2: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
3: strBase & "WHERE objectCategory='user' AND mail=?"
4:
5: 'Create and bind parameter
6: Set ADParam = ADCommand.CreateParameter("", adVarChar, adParamInput, 40, email)
7: ADCommand.Parameters.Append ADParam
That way, the question mark â?â gets replaced with â’youremail@example.com’â (including the single quote marks) and my injection attempt gets quoted in magical ways (usually, doubling single-quotes, but the parameter insertion is capable of knowing in what way itâs being inserted, and how exactly to quote the data).
Thatâs the rather meaningful message:
Run-time error â-2147467262 (80004002)â:
No such interface supported
It doesnât actually tell me which interface is supported, so of course I spend a half hour trying to figure out what changed that might have gone wrong â whether Iâm using a question mark where perhaps I might need a named variable, possibly preceded by an â@â sign, but no, thatâs SQL stored procedures, which are almost never the SQL injection solution they claim to be, largely because the same idiot who uses concatenation in his web service also does the same stupid trick in his SQL stored procedures, but Iâm rambling now and getting far away from the point if I ever had one, soâŠ
The interface that isnât supported is the ability to set parameters.
The single best solution to SQL injection just plain isnât provided in the ADODB library and/or the ADsDSOObject provider.
Why on earth would you miss that out, Microsoft?
So, the smart answer here is input validation where possible, and if you absolutely have to accept any and all input, you must quote the strings that youâre passing in.
In my case, because Iâm dealing with email addresses, I think I can reasonably restrict my input to alphanumerics, the â@â sign, full stops, hyphens and underscores.
Input validation depends greatly on the type of your input. If itâs a string, that will need to be provided in your SQL request surrounded with single quotes â that means that any single quote in the string will need to be encoded safely. Usually that means doubling the quote mark, although you might choose to replace them with double quotes or back ticks.
If your input is a number, you can be more restrictive in your input validation â only those characters that are actually parts of a number. Thatâs not necessarily as easy as it sounds â the letter âeâ is often part of numbers, for instance, and you have to decide whether youâre going to accept bases other than 10. But from the perspective of securing against SQL injection, again thatâs not too difficult to enforce.
Finally, of course, you have to decide what to do when bad input comes in â an error response, a static value, throw an exception, ignore the input and refuse to respond, etc. If you choose to signal an error back to the user, be careful not to provide information an attacker could find useful.
Sometimes the mere presence of an error is useful.
Certainly if you feed back to the attacker the full detail of the SQL query that went wrong â and people do sometimes do this! â you give the attacker far too much information.
Even feeding back the incorrect input can be a bad thing in many cases. In the Excel case Iâm running into, thatâs probably not easily exploitable, but you probably should be cautious anyway â if itâs an attacker causing an error, they may want you to echo back their input to exploit something else.
Seriously, Microsoft, this is an unforgiveable lapse â not only is there no ability to provide the single best protection, because you didnât implement the parameter interface, but also your own samples provide examples of code that is vulnerable to SQL injections. [Here and here â the other examples I was able to find use hard-coded search filters.]
Microsoft, update your samples to demonstrate how to securely query AD through the ADODB library, and consider whether itâs possible to extend the provider with the parameter interface so that we can use the gold-standard protection.
Parse your parameters â make sure they conform to expected values. Complain to the user when they donât. Donât use lack of samples as a reason not to deliver secure components.
And, because I know a few of you will hope to copy directly from my code, hereâs how I wound up doing this exact function.
Please, by all means review it for mistakes â I donât guarantee that this is correct, just that itâs better than I found originally. For instance, one thing it doesnât check for is if the user actually has a value set for the âmailâ field in Active Directory â I can tell you for certain, itâll give a null reference error if you have one of these users come back from your search.
1: Function GetEmail(email As String) As String
2: ' Given one of this users' email addresses, find the canonical one.
3:
4: ' Pre-execution input validation - email must contain only recognised characters.
5: If email Like "*[!a-zA-Z0-9_@.]*" Then
6: GetEmail = "Illegal characters"
7: Exit Function
8: End If
9:
10:
11: ' Find our default domain base to search from
12: Set objRootDSE = GetObject("LDAP://RootDSE")
13: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"
14:
15: ' Open a connection to AD
16: Set ADOConnection = CreateObject("ADODB.Connection")
17: ADOConnection.Provider = "ADsDSOObject"
18: ADOConnection.Open "Active Directory Provider"
19:
20: ' Create a command
21: Set ADCommand = CreateObject("ADODB.Command")
22: ADCommand.ActiveConnection = ADOConnection
23:
24: 'Find user based on their email address
25: ADCommand.CommandText = _
26: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
27: strBase & " WHERE objectCategory='user' AND mail='" & email & "'"
28:
29: ' Execute this command
30: Set ADrecordset = ADCommand.Execute
31:
32: ' Post execution validation - we should have exactly one answer.
33: If ADrecordset Is Nothing Or (ADrecordset.EOF And ADrecordset.BOF) Then
34: GetEmail = "Not found"
35: Exit Function
36: End If
37: If ADrecordset.RecordCount > 1 Then
38: GetEmail = "Many matches"
39: Exit Function
40: End If
41:
42: ' Extract the canonical email address for this user.
43: GetEmail = ADrecordset.Fields("Mail")
44:
45: ' Return.
46: End Function
As always, let me know if you find this at all useful.
Version control is one of those vital tools for developers that everyone has to use but very few people actually enjoy or understand.
So, itâs with no surprise that I noted a few months ago that the version control software on which Iâve relied for several years for my personal projects, Component Softwareâs CS-RCS, has not been built on in years, and cannot now be downloaded from its source site. [Hence no link from this blog]
Iâve used git before a few times in professional projects while I was working at Amazon, but relatively reluctantly â it has incredibly baroque and meaningless command-line options, and gives the impression that it was written by people who expected their users to be just as proficient with the ins and outs of version control as they are.
While I think itâs a great idea for developers to build software they would use themselves, I think itâs important to make sure that the software you build is also accessible by people who arenât the same level of expertise as yourself. After all, if your users were as capable as the developer, they would already have built the solution for themselves, so your greater user-base comes from accommodating novices to experts with simple points of entry and levels of improved mastery.
git, along with many other open source, community-supported tools, doesnât really accommodate the novice.
As such, it means that most people who use it rely on âcookbooksâ of sets of instructions. âIf you want to do X, type commands Y and Zâ â without an emphasis on understanding why youâre doing this.
This leads inexorably to a feeling that youâre setting yourself up for a later fall, when you decide you want to do an advanced task, but discover that a decision youâve made early on has prevented you from doing the advanced task in the way you want.
Thatâs why Iâve been reluctant to switch to git.
But itâs clear that git is the way forward in the tools Iâm most familiar with â Visual Studio and its surrounding set of developer applications.
Itâs one of those decisions Iâve made some time ago, but not enacted until now, because I had no idea how to start â properly. Every git repository Iâve worked with so far has either been set up by someone else, or set up by me, based on a cookbook, for a new project, and in a git environment thatâs managed by someone else. I donât even know if those terms, repository and environment, are the right terms for the things I mean.
There are a number of advanced things I want to do from the very first â particularly, I want to bring my code from the old version control system, along with its history where possible, into the new system.
And I have a feeling that this requires I understand the decisions I make when setting this up.
So, it was with much excitement that I saw a link to this arrive in my email:
Next thing is Iâm going to watch this, and see how Iâm supposed to work with git. Iâll let you know how it goes.