Information Security is full of terminology.
Sometimes we even understand what we mean. Iâve yet to come across a truly awesome, yet brief, definition of âthreatâ, for instance.
But one that bugs me, because it shouldnât be that hard to get right, and because I hear it from people I otherwise respect greatly, is that of âinput validationâ.
Fight me on this, but I think that validation is essentially a yes/no decision on a set of input, whether itâs textual, binary, or whatever other format you care to define.
Exactly what you are validating is up for debate, whether youâre looking at syntax or semantics â is it formatted correctly, versus does it actually make sense?
âGreen ideas sleep furiouslyâ is a famous example of a sentence that is syntactically correct â it follows a standard âAdjective noun verb adverbâ pattern that is common in English â but semantically, it makes no sense: ideas canât be green, and they canât sleep, and nothing can sleep furiously (although my son used to sleep with his fists clenched really tight when he was a little baby).
â0 / 0â is a syntactically correct mathematical expression, but you can argue if itâs semantically correct.
âSell 1000 sharesâ might be a syntactically correct instruction, but semantically, it could be you donât have 1000 shares, or thereâs a business logic limit, which says such a transaction requires extra authentication.
So thereâs a difference between syntactical validation and semantic validation, butâŠ
Injection attacks occur when an input data â a string of characters â is semantically valid in the language of the enclosing code, as code itself, and not just as data. Sometimes (but not always) this means the data contains a character or character sequence that allows the data to âescapeâ from its data context to a code context.
This is a question I ask, in various round-about ways, in a lot of job interviews, so itâs quite an important question.
The answer is really simple.
Yes. And no.
If you can validate your input, such that it is always syntactically and semantically correct, you can absolutely prevent injection exploits.
But this is really only possible for relatively simple sets of inputs, and where the processing is safe for that set of inputs.
An example â suppose Iâve got a product ordering site, and Iâm selling books.
You can order an integer number of books. Strictly speaking, positive integers, and 0 makes no sense, so start at 1. You probably want to put a maximum limit on that field, perhaps restricting people to buying no more than a hundred of that book. If theyâre buying more, theyâll want to go wholesale anyway.
So, your validation is really simple â âis the field an integer, and is the integer value between 1 and 100?â
Having said âyes, and noâ, I have to show you an example of the ânoâ, right?
OK, letâs say youâre asking for validation of names of people â whatâs your validation rules?
Letâs assume youâre expecting everyone to have âlatinisedâ their name, to make it easy. All the letters are in the range a-z, or A-Z if thereâs a capital letter.
Great, so thereâs a rule â only match â[A-Za-z]â
Unless, you know, Leonardo da Vinci. Or di Caprio. So you need spaces.
Or Daniel Day-Lewis. So thereâs also hyphens to add.
And if you have an OâReilly, an OâBrian, or a DâArtagnan, or a NâDour â yes, youâre going to add apostrophes.
Now your validation rule is letting in a far broader range of characters than you start out with, and thereâs enough there to allow for SQL injection to happen.
Input can now be syntactically correct by your validation rule, and yet semantically equivalent to data plus SQL code.
I have a working hypothesis. It goes like this.
As a neophyte in information security, you learn a trick.
That trick is validation, and itâs a great thing to share with developers.
They donât need to be clever or worry hard about the input that comes in, they simply need to validate it.
It actually feels good to reject incorrect input, because you know youâre keeping the bad guys out, and the good guys in.
Then you find an input field where validation alone isnât sufficient.
But youâve told everyone â and had other security folk agree with you â that validation is the way to solve injection attacks.
So you learn a new trick â a new way of protecting inputs.
After all, it ⊠uhh, kind of does the same thing. It stops injection attacks, so it must be validation.
This new trick is encoding, quoting, or in some way transforming the data, so the newly transformed data is safe to accept.
Every one of those apostrophes? Turn them into the sequence â'â if theyâre going into HTML, or double them if theyâre in a SQL string, or â and this is FAR better â use parameterised queries so you donât have to even know how the input string is being encoded on its way into the SQL command.
Now your input can be validated â and injection attacks are stopped.
In fact, once youâve encoded your inputs properly, your validation can be entirely open and empty! At least from the security standpoint, because youâve made the string semantically entirely meaningless to the code in which it is to be embedded as data. There are no escape characters or sequences, because they, too, have been encoded or transformed into semantically safe data.
And I happen to think itâs important to separate the two concepts of validation and encoding.
Validation is saying âyesâ or ânoâ to the question âis this string âgoodâ data?â You can validate in a number of different ways, and with good defence in depth, youâll validate at different locations, based on different knowledge about what is âgoodâ. This matches very strongly with the primary dictionary definition of âvalidationâ â itâs awesome when a technical term matches very closely with a common language term, because teaching it to others becomes easier.
Encoding doesnât say âyesâ or ânoâ, encoding simply takes whatever input itâs given, and makes it safe for the next layer to which the data will be handed.
Itâs not.
Sometimes, itâs just my job to find vulnerabilities, and while thatâs kind of fun, itâs also a little unexciting compared to the thrill of finding bugs in other peopleâs software and getting an actual âthank youâ, whether monetarily or just a brief word.
About a year ago, I found a minor Cross-Site Scripting (XSS) flaw in a major companyâs web page, and while it wasnât a huge issue, I decided to report it, as I had a few years back with a similar issue in the same web site. I was pleased to find that the company was offering a bounty programme, and simply emailing them would submit my issue.
The first thing to notice, as with all XSS issues, is that there were protections in place that had to be got around. In this case, some special characters or sequences were being blocked. But not all. And itâs really telling that there are still many websites which have not implemented widespread input validation / output encoding as their XSS protection. So, while the WAF slowed me down even when I knew the flaw existed, it only added about 20 minutes to the exploit time. So, my example had to use âconfirm()â instead of âalert()â or âprompt()â. But really, if I was an attacker, my exploit wouldnât have any of those functions, and would probably include an encoded script that wouldnât be detected by the WAF either. WAFs are great for preventing specific attacks, but arenât a strong protection against an adversary with a little intelligence and understanding.
My email resulted in an answer that same day, less than an hour after my initial report. A simple âthank youâ, and âweâre forwarding this to our developersâ goes a long way to keeping a security researcher from idly playing with the thought of publishing their findings and moving on to the next game.
In under a week, I found that the original demo exploit was being blocked by the WAF â but if I replaced âonclickâ with âoclickâ, âonmouseoverâ with âomouseoverâ, and âconfirmâ with âcofirmâ, I found the blocking didnât get in the way. Granted, since those arenât real event handlers or JavaScript functions, I canât use those in a real exploit, but it does mean that once again, all the WAF does is block the original example of the attack, and it took only a few minutes again to come up with another exploit string.
If theyâd told me âhey, weâre putting in a WAF rule while we work on fixing the actual bugâ, I wouldnât have been so eager to grump back at them and say they hadnât fixed the issue by applying their WAF and by the way, hereâs another URL to exploit it. But they did at least respond to my grump and reassure me that, yes, they were still going to fix the application.
I heard nothing after that, until in February of this year, over six months later, I replied to the original thread and asked if the report qualified for a bounty, since I noticed that they had actually fixed the vulnerability.
No response. Thinking of writing this up as an example of how security researchers still get shafted by businesses â bear in mind that my approach is not to seek out bounties for reward, but that I really think itâs common courtesy to thank researchers for reporting to you rather than pwning your website and/or your customers.
About a month later, while looking into other things, I found that the company exists on HackerOne, where they run a bug bounty. This renewed my interest in seeing this fixed. So I reported the email exchange from earlier, noted that the bug was fixed, and asked if it constituted a rewardable finding. Again, a simple âthanks for the report, but this doesnât really rise to the level of a bountyâ is something Iâve been comfortable with from many companies (though it is nice when you do get something, even if itâs just a keychain or a t-shirt, or a bag full of stickers).
3/14: I got a reply the next day, indicating that âwe are investigatingâ.
3/28: Then nothing for two weeks, so I posted another response asking where things were going.
4/3: Then a week later, a response. âWeâre looking into this and will be in touch soon with an update.â
4/18: Me: Ping?
5/7: Me: Hey, how are we doing?
5/16: Anything happening?
5/18: Finally, over two months after my report to the company through HackerOne, and ten months after my original email to the first bug bounty address, itâs addressed.
5/19: The severity of the bug report is lowered (quite rightly, the questionnaire they used pushed me to a priority of âhighâ, which was by no means warranted). A very welcome bounty, and a bonus for my patience – unexpected but welcome, are issued.
The cheapest way to learn things is from someone elseâs mistakes. So I decided to share with my readers the things I picked up from this experience.
Here are a few other lessons Iâve picked up from bug bounties Iâve observed:
If you start a bug bounty, consider how ready you might be. Are you already fixing all the security bugs you can find for yourself? Are you at least fixing those bugs faster than you can find more? Do your developers actually know how to fix a security bug, or how to verify a vulnerability report? Do you know how to expand on an exploit, and find occurrences of the same class of bug? [If you donât, someone will milk your bounty programme by continually filing variations on the same basic flaw]
How many security vulnerabilities do you think you have? Multiply that by an order of magnitude or two. Now multiply that by the average bounty you expect to offer. Add the cost of the personnel who are going to handle incoming bugs, and the cost of the projects they could otherwise be engaged in. Add the cost of the developers, whose work will be interrupted to fix security bugs, and add the cost of the features that didnât get shipped on time before they were fixed. Sure, some of that is just a normal cost of doing business, when a security report could come at you out of the blue and interrupt development until itâs fixed, but starting a bug bounty paints a huge target on you.
Hiring a penetration tester, or renting a tool to scan for programming flaws, has a fixed cost â you can simply tell them how much youâre willing to pay, and theyâll work for that long. A bug bounty may result in multiple orders of magnitude more findings than you expected. Are you going to pay them all? What happens when your bounty programme runs out of money?
Finding bugs internally using bug bashes, software scanning tools or dedicated development staff, has a fixed cost, which is probably still smaller than the amount of money youâre considering on putting into that bounty programme.
Thatâs not to say bug bounties are always going to be uneconomical. At some point, in theory at least, your development staff will be sufficiently good at resolving and preventing security vulnerabilities that are discovered internally, that they will be running short of security bugs to fix. They still exist, of course, but theyâre more complex and harder to find. This is where it becomes economical to lure a bunch of suckers â excuse me, security researchers â to pound against your brick walls until one of them, either stronger or smarter than the others, finds the open window nobody saw, and reports it to you. And you give them a few hundred bucks â or a few thousand, if itâs a really good find â for the time that they and their friends spent hammering away in futility until that one successful exploit.
At that point, your bug bounty programme is actually the least expensive tool in your arsenal.
Iâm pretty much unhappy with the use of âSecurity Questionsâ â things like âwhatâs your motherâs maiden nameâ, or âwhat was your first petâ. These questions are sometimes used to strengthen an existing authentication control (e.g. âyouâve entered your password on a device that wasnât recognised, from a country you normally donât visit â please answer a security questionâ), but far more often they are used as a means to recover an account after the password has been lost, stolen or changed.
Iâve been asked a few times, given that these are pretty widely used, to explain objectively why I have such little disregard for them as a security measure. Hereâs the Too Long; Didnât Read summary:
Letâs take them one by one:
Whatâs your favourite colour? Blue, or Green. At the outside, red, yellow, orange or purple. That covers most peopleâs choices, in less than 3 bits of entropy.
Whatâs your favourite NBA team? Thereâs 29 of those â 30, if you count the 76ers. Thatâs 6 bits of entropy.
Obviously, there are questions that broaden this, but are relatively easy to guess with a small number of tries â particularly when you can use the next fact about Security Questions.
Whatâs your motherâs maiden name? Itâs a matter of public record.
What school did you go to? If we know where you grew up, itâs easy to guess this, since there were probably only a handful of schools you could possibly have attended.
Who was your first boyfriend/girlfriend? Many people go on about this at length in Facebook posts, Iâm told. Or thereâs this fact:
Whatâs your porn name? Whatâs your Star Wars name? Whatâs your Harry Potter name?
All these stupid quizzes, and they get you to identify something about yourself â the street you grew up on, the first initial of your secret crush, how old you were when you first heard saxophones.
And, of course, because of the next fact, all I really have to do is convince you that you want a free account at my site.
Every site that you visit asks you variants of the same security questions â which means that youâll have told multiple sites the same answers.
Youâve been told over and over not to share your password across multiple sites â but here you are, sharing the security answers that will reset your password, and doing so across multiple sites that should not be connected.
And do you think those answers (and the questions they refer back to) are kept securely by these various sites? No, because:
Thereâs regulatory protection, under regimes such as PCI, etc, telling providers how to protect your passwords.
There is no such advice for protecting security questions (which are usually public) and the answers to them, which are at least presumed to be stored in a back-end database, but are occasionally sent to the client for comparison against the answers! Thatâs truly a bad security measure, because of course youâre telling the attacker.
Even assuming the security answers are stored in a database, theyâre generally stored in plain text, so that they can be accessed by phone support staff to verify your answers when you call up crying that youâve forgotten your password. [Awesome pen-testing trick]
And because the answers are shared everywhere, all it takes is a breach at one provider to make the security questions and answers they hold have no security value at all any more.
Thereâs an old joke in security circles, âmy password got hacked, and now I have to rename my dogâ. Itâs really funny, because there are so many of these security answers which are matters of historical fact â while you can choose different questions, you canât generally choose a different answer to the same question.
Well, obviously, you can, but then youâve lost the point of a security question and answer â because now you have to remember what random lie you used to answer that particular question on that particular site.
Yes, I know you can lie, you can put in random letters or phrases, and the system may take them (âYour place of birth cannot contain spacesâ â so, Las Vegas, New York, Lake Windermere are all unusable). But then youâve just created another password to remember â and the point of these security questions is to let you log on once youâve forgotten your password.
So, youâve forgotten your password, but to get it back, you have to remember a different password, one that you never used. Thereâs not much point there.
Security questions and answers, when used for password recovery / reset, are complete rubbish.
Security questions are low-entropy, predictable and discoverable password substitutes that are shared across multiple sites, are under- or un-protected, and (like fingerprints) really canât be changed if they become exposed. This makes them totally unsuited to being used as password equivalents in account recovery / password reset schemes.
If you have to implement an account recovery scheme, find something better to use. In an enterprise, as Iâve said before, your best bet is to use something that the enterprise does well â the management hierarchy. Every time you forget your password, you have to get your manager, or someone at the next level up from them, to reset your password for you, or to vouch for you to tech support. That way, someone who knows you, and can affect your behaviour in a positive way, will know that you keep forgetting your password and could do with some assistance. In a social network, require the
Also, password hints are bullshit. Many of the Adobe breachâs âpassword hintsâ were actually just the password in plain-text. And, because Adobe didnât salt their password hashes, you could sort the list of password hashes, and pick whichever of the password hints was either the password itself, or an easy clue for the password. So, even if you didnât use the password hint yourself, or chose a really cryptic clue, some other idiot came up with the same password, and gave a âDaily Express Quick Crosswordâ quality clue.
Credentials include a Claim and a Proof (possibly many).
The Claim is what states one or more facts about your identity.
A Username is one example of a Claim. So is Group Membership, Age, Eye Colour, Operating System, Installed Software, etcâŠ
The Proof is what allows someone to reliably trust the Claim is true.
A Password is one example of a Proof. So is a Signature, a Passport, etcâŠ
Claims are generally public, or at least non-secret, and if not unique, are at least specific (e.g. membership of the group âBrown eyesâ isnât open to people with blue eyes).
Proofs are generally secret, and may be shared, but such sharing should not be discoverable except by brute force. (Which is why we salt passwords).
Password resets can occur for a number of reasons â youâve forgotten your password, or the password change functionality is more cumbersome than the password reset, or the owner of the account has changed (is that allowable?) â but the basic principle is that an account needs a new password, and there needs to be a way to achieve that without knowledge of the existing password.
Letâs talk as if itâs a forgotten password.
So we have a Claim â we want to assert that we possess an identity â but we have to prove this without using the primary Proof.
Which means we have to know of a secondary Proof. There are common ways to do this â alternate ID, issued by someone you trust (like a government authority, etc). Itâs important in the days of parody accounts, or simply shared names (is that Bob Smith, his son, Bob Smith, or his unrelated neighbour, Bob Smith?) that you have associated this alternate ID with the account using the primary Proof, or as a part of the process of setting up the account with the primary Proof. Otherwise, youâre open to account takeover by people who share the same name as their target.
And you can legally change your name.
E-mail.
Pretty much every public web site relies on the use of email for password reset, and uses that email address to provide a secondary Proof.
Itâs not enough to know the email address â thatâs unique and public, and so it matches the properties of a Claim, not a Proof, of identity.
We have to prove that we own the email address.
Itâs not enough to send email FROM the email address â email is known to be easily forged, and so thereâs no actual proof embodied in being able to send an email.
That leaves the server with the prospect of sending something TO the email address, and the recipient having proved that they received it.
You could send a code-word, and then have the recipient give you the code-word back. A shared secret, if you like.
And if you want to do that without adding another page to the already-too-large security area of the site, you look for the first place that allows you to provide your Claim and Proof, and you find the logon page.
By reusing the logon page, youâre going to say that code-word is a new password.
[This is not to say that email is the only, or even the best, way to reset passwords. In an enterprise, you have more reliable proofs of identity than an email provider outside of your control. You know people who should be able to tell you with some surety that a particular person is who they claim to be. Another common secondary identification is the use of Security Questions. See my upcoming article, âSecurity Questions are Bullshitâ for why this is a bad idea.]
Well, yes and no. No, actually. Pretty much definitely no, itâs not your new password.
Letâs imagine what can go wrong. If I donât know your password, but I can guess your username (because itâs not secret), I can claim to be you wanting to reset your password. That not only creates opportunity for me to fill your mailbox with code-words, but it also prevents you from logging on while the code-words are your new password. A self-inflicted denial of service.
So your old password should continue working, and if you never use the code-word, because youâre busy ignoring and deleting the emails that come in, it should keep working for you.
Iâve frequently encountered situations in my own life where Iâve forgotten my password, gone through the reset process, and itâs only while typing in the new password, and being told what restrictions there are on characters allowed in the new password, that I remember what my password was, and I go back to using that one.
In a very real sense, the code-word sent to you is NOT your new password, itâs a code-word that indicates youâve gone the password reset route, and should be given the opportunity to set a new password.
Try not to think of it as your âtemporary passwordâ, itâs a special flag in the logon process, just like a âduress passwordâ. It doesnât replace your actual password.
Shared secrets are fantastic, useful, and often necessary â TLS uses them to encrypt data, after the initial certificate exchange.
But the trouble with shared secrets is, you canât really trust that the other party is going to keep them secret very long. So you have to expire them pretty quickly.
The same is true of your password reset code-word.
In most cases, a user will forget their password, click the reset link, wait for an email, and then immediately follow the password reset process.
Users are slow, in computing terms, and email systems arenât always directly linked and always-connected. But I see no reason why the most usual automated password reset process should allow the code-word to continue working after an hour.
[If the process requires a manual step, you have to count that in, especially if the manual step is something like âcontact a manager for approvalâ, because managers arenât generally 24/7 workers, the code-word is going to need to last much longer. But start your discussion with an hour as the base-point, and make people fight for why itâll take longer to follow the password reset process.]
You can absolutely supply a URL in the email that will take the user to the right page to enter the code-word. But you canât carry the code-word in the URL.
Why? Check out these presentations from this yearâs Black Hat and DefCon, showing the use of a malicious WPAD server on a local â or remote â network whose purpose is to trap and save URLs, EVEN HTTPS URLs, and their query strings.
Every URL you send in an email is an HTTP or HTTPS GET, meaning all the parameters are in the URL or in the query string portion of the URL.
This means the code-word can be sniffed and usurped if itâs in the URL. And the username is already assumed to be known, since itâs a non-secret. [Just because itâs assumed to be known, donât give the attacker an even break â your message should simply say âyou requested a password reset on an account at our websiteâ â a valid request will come from someone who knows which account at your website they chose to request.]
So, donât put the code-word in the URL that you send in the email.
DONâT LOG THE PASSWORD
I have to say that, because otherwise people do that, as obviously wrong as it may seem.
But log the fact that youâve changed a password for that user, along with when you did it, and what information you have about where the user reset their password from.
Multiple users resetting their password from the same IP address â thatâs a bad sign.
The same user resetting their password multiple times â thatâs a bad sign.
Multiple expired code-words â thatâs a bad sign.
Some of the bad things being signaled include failures in your own design â for instance, multiple expired code-words could mean that your password reset function has stopped working and needs checking. You have code to measure how many abandoned shopping carts you have, so include code that measures how many abandoned password reset attempts you have.
Did I miss something, or get something wrong? Let me know by posting a comment!
In which I move my version control from ComponentSoftwareâs CS-RCS Pro to Git while preserving commit history.
[If you donât want the back story, click here for the instructions!]
OK, so having watched the video I linked to earlier, I thought Iâd move some of my old projects to Git.
I picked one at random, and went looking for tools.
Iâm hampered a little by the fact that all my old projects used ComponentSoftwareâs âCS-RCS Proâ.
A couple of really good reasons:
But you know who doesnât use CS-RCS Pro any more?
Thatâs right, ComponentSoftware.
Itâs a dead platform, unsupported, unpatched, and belongs off my systems.
One simple reason â if I move off the platform, I face the usual choice when migrating from one version control system to another:
The second option seems a bit of a waste to me.
OK, so yes, technically I could mix the two modes, by using CS-RCS Pro to browse the ancient history when I need to, and Git to browse recent history, after starting Git from a clean working folder. But I could see a couple of problems:
So, really, I wanted to make sure that I could move my files, history and all.
I really didnât have a good way to do it.
Clearly, any version control system can be moved to any other version control system by the simple expedient of:
But, as you can imagine, thatâs really long-winded and manual. That should be automatable.
In fact, given the shared APIs of VSS-compatible source control services, Iâm truly surprised that nobody has yet written a tool to do basically this task. Iâd get on it myself, but I have other things to do. Maybe someone will write a âVSS2Gitâ or âVSS2VSSâ toolkit to do just this.
There is a format for creating a single-file copy of a Git repository, which Git can process using the command âgit fast-importâ. So all I have to find is a tool that goes from a CS-RCS repository to the fast-import file format.
So, clearly thereâs no tool to go from CS-RCS Pro to Git. Thereâs a tool to go from CS-RCS Pro to CVS, or there was, but that was on the now-defunct CS-RCS web site.
But⊠Remember I said that itâs compatible with GNU RCS.
And thereâs scripts to go from GNU RCS to Git.
OK, so the script for this is written in Ruby, and as I read it, there seemed to be a few things that made it look like it might be for Linux only.
I really wasnât interested in making a Linux VM (easy though that may be) just so I could convert my data.
Everything changed with the arrival of the recent Windows 10 Anniversary Update, because along with it came a new component.
Bash on Ubuntu on Windows.
Itâs like a Linux VM, without needing a VM, without having to install Linux, and it works really well.
With this, I could get all the tools I needed â GNU RCS, in case I needed it; Ruby; Git command line â and then I could try this out for myself.
Of course, I wouldnât be publishing this if it wasnât somewhat successful. But there are some caveats, OK?
Iâve tried this a few times, on ONE of my own projects. This isnât robustly tested, so if something goes all wrong, please by all means share, and people who are interested (maybe me) will probably offer suggestions, some of them useful. Iâm not remotely warrantying this or suggesting itâs perfect. It may wipe your development history out of your one and only copy of version control⊠so donât do it on your one and only copy. Make a backup first.
GNU RCS likes to store files in one of two places â either in the same directory as the working files, but with a â,vâ pseudo-extension added to the filename, or in a sub-directory off each working folder, called âRCSâ and with the same â,vâ extension on the files. If you did either of these things, thereâs no surprises. ButâŠ
CS-RCS Pro doesnât do this. It has a separate RCS Repository Root. I put mine in C:\RCS, but you may have yours somewhere else. Underneath that RCS Repository Root is a full tree of the drives youâve used CS-RCS to store (without the â:â), and a tree under that. I really hope you didnât embed anything too deep, because that might bode ill.
Initially, this seemed like a bad thing, but because you donât actually need the working files for this task, you can pretend that the RCS Repository is actually your working space.
Maybe this is obvious, but it took me a moment of thinking to decide I didnât have to move files into RCS sub-folders of my working directories.
Make this a âflag dayâ. After you do this conversion, never use CS-RCS Pro again. It was good, and it did the job, and itâs now buried in the garden next to Old Yeller. Do not sprinkle the zombification water on that hallowed ground to revive it.
This also means you MUST check in all your code before converting, because checking it in afterwards will be ⊠difficult.
Assumption: You have Windows 10.
This might look like a lot of instructions, but I mostly just wanted to be clear. This is really quick work. If you screw up after the âgit initâ command, simply ârm ârf .gitâ to remove the new repository.
The Ubuntu âCircle of Friendsâ logo.
Depending on the kind of company you work at, itâs either:
If you work at the first place, reach out to me on LinkedIn â I know some people who might want to work with you.
If youâre at the third place, you should probably get out now. Whatever theyâre paying you, or however much the stock might be worth come the IPO, itâs not worth the pain and suffering.
If youâre at the second place, congratulations â youâre at a regular, ordinary workplace that could do with a little better management.
A surprisingly great deal.
Whenever thereâs a security incident, there should be an investigation as to its cause.
Clearly the cause is always human error. Machines donât make mistakes, they act in predictable ways â even when they are acting randomly, they can be stochastically modeled, and errors taken into consideration. Your computer behaves like a predictable machine, but at various levels it actually routinely behaves like itâs rolling dice, and there are mechanisms in place to bias those random results towards the predictable answers you expect from it.
Humans, not so much.
Humans make all the mistakes. They choose to continue using parts that are likely to break, because they are past their supported lifecycle; they choose to implement only part of a security mechanism; they forget to finish implementing functionality; they fail to understand the problem at hand; etc, etc.
It always comes back to human error.
Occasionally I will experience these great flashes of inspiration from observing behaviour, and these flashes dramatically affect my way of doing things.
One such was when I attended the weekly incident review board meetings at my employer of the time â a health insurance company.
Once each incident had been resolved and addressed, they were submitted to the incident review board for discussion, so that the company could learn from the cause of the problem, and make sure similar problems were forestalled in future.
These werenât just security incidents, they could be system outages, problems with power supplies, really anything that wasnât quickly fixed as part of normal process.
But the principles I learned there apply just as well to security incident.
The biggest principle I learned was âroot cause analysisâ â that you look beyond the immediate cause of a problem to find what actually caused it in the long view.
At other companies, who canât bear to think that they didnât invent absolutely everything, this is termed differently, for instance, âthe five whysâ (suggesting if you ask âwhy did that happen?â five times, youâll get to the root cause). Other names are possible, but the majority of the English-speaking world knows it as âroot cause analysisâ
This is where I learned that if you believe the answer is that a single humanâs error caused the problem, you donât have the root cause.
Whenever I discuss this with friends, they always say âBut! What about this example, or that?â
You should always ask those questions.
Hereâs some possible individual causes, and some of their associated actual causes:
Bob pulled the wrong lever | Who trained Bob about the levers to pull? Was there documentation? Were the levers labeled? Did anyone assess Bobâs ability to identify the right lever to pull by testing him with scenarios? |
Kate was evil and did a bad thing | Why was Kate allowed to have unsupervised access? Where was the monitoring? Did we hire Kate? Why didnât the background check identify the evil? |
Jeremy told everyone the wrong information | Was Jeremy given the right information? Why was Jeremy able to interpret the information from right to wrong? Should this information have been automatically communicated without going through a Jeremy? Was Jeremy trained in how to transmute information? Why did nobody receiving the information verify it? |
Grace left her laptop in a taxi | Why does Grace have data that we care about losing â on her laptop? Can we disable the laptop remotely? Why does she even have a laptop? What is our general solution for people, who will be people, leaving laptops in a taxi? |
Jane wrote the algorithm with a bug in it | Who reviews Janeâs code? Who tests the code? Is the test automated? Was Jane given adequate training and resources to write the algorithm in the first place? Is this her first time writing an algorithm â did she need help? Who hired Jane for that position â what process did they follow? |
I could go on and on, and I usually do, but itâs important to remember that if you ever find yourself blaming an individual and saying âhuman error caused this faultâ, itâs important to remember that humans, just like machines, are random and only stochastically predictable, and if you want to get predictable results, you have to have a framework that brings that randomness and unpredictability into some form of logical operation.
Many of the questions I asked above are also going to end up with the blame apparently being assigned to an individual â thatâs just a sign that it needs to keep going until you find an organisational fix. Because if all you do is fix individuals, and you hire new individuals and lose old individuals, your organisation itself will never improve.
[Yes, for the pedants, your organisation is made up of individuals, and any organisational fix is embodied in those individuals â so blog about how the organisation can train individuals to make sure that organisational learning is passed on.]
Finally, if youâd like to not use Ubuntu as my âcircle of blameâ logo, thereâs plenty of others out there â for instance, Microsoft Alumni:
Randy Westergren posted a really great piece entitled âWidespread XSS Vulnerabilities in Ad Network Code Affecting Top Tier Publishers, Retailersâ
Go read it â Iâll wait.
The article triggered a lot of thoughts that Iâll enumerate here:
This was reported by SoftPedia as a ânew attackâ, but itâs really an old attack. This is just another way to execute DOM-based XSS.
That means that web sites are being attacked by old bugs, not because their own coding is bad, but because they choose to make money from advertising.
And because the advertising industry is just waaaay behind on securing their code, despite being effectively a widely-used framework across the web.
Youâve seen previously on my blog how I attacked Troy Huntâs blog through his advertising provider, and heâs not the first, or by any means the last, âvictimâ of my occasional searches for flaws.
Itâs often difficult to trace which ad provider is responsible for a piece of vulnerable code, and the hosting site may not realise the nature of their relationship and its impact on security. As a security researcher, itâs difficult to get traction on getting these vulnerabilities fixed.
Important note
Iâm trying to get one ad provider right now to fix their code. I reported a bug to them, they pointed out it was similar to the work Randy Westergren had written up.
So they are aware of the problem.
Itâs over a month later, and the sites I pointed out to them as proofs of concept are still vulnerable.
Partly, this is because I couldnât get a reliable repro as different ad providers loaded up, but itâs been two weeks since I sent them a reliable repro â which is still working.
Reported a month ago, reliable repro two weeks ago, and still vulnerable everywhere.
[If youâre defending a site and want to figure out which ad provider is at fault, inject a âdebugger
â statement into the payload, to have the debugger break at the line thatâs causing a problem. You may need to do this by replacing âprompt()
â or âalert()
â with â(function(){debugger})()
â â note that itâll only break into the debugger if you have the debugger open at the time.]
Randyâs attack example uses a symbol you wonât see at all in some web sites, but which you canât get away from in others. The â#â or âhashâ symbol, also known as ânumberâ or âhashâ. [Donât call it âpoundâ, please, thatâs a different symbol altogether, âÂŁâ] Hereâs his example:
http://nypost.com/#1'-alert(1)-'"-alert(1)-"
Different parts of the URL have different names. The âhttp:â part is the âprotocolâ, which tells the browser how to connect and what commands will likely work. â//nypost.com/â is the host part, and tells the browser where to connect to. Sometimes a port number is used â commonly, 80 or 443 â after the host name but before the terminating â/â of the host element. Anything after the host part, and before a question-mark or hash sign, is the âpathâ â in Randyâs example, the path is left out, indicating he wants the root page. An optional âqueryâ part follows the path, indicated by a question mark at its start, often taking up the rest of the URL. Finally, if a â#â character is encountered, this starts the âanchorâ part, which is everything from after the â#â character on to the end of the URL.
The âanchorâ has a couple of purposes, one by design, and one by evolution. The designed use is to tell the browser where to place the cursor â where to scroll to. I find this really handy if I want to draw someoneâs attention to a particular place in an article, rather than have them read the whole story. [It can also be used to trigger an onfocus event handler in some browsers]
The second use is for communication between components on the page, or even on other pages loaded in frames.
I want to emphasise this â and while Randy also mentioned it, I think many web site developers need to understand this when dealing with security.
The anchor tag is not sent to the server.
The anchor tag does not appear in your serverâs logs.
WAFs cannot filter the anchor tag.
If your site is being attacked through abuse of the anchor tag, you not only canât detect it ahead of time, you canât do basic forensic work to find out useful things such as âwhen did the attack startâ, âwhat sort of things was the attacker doingâ, âhow many attacks happenedâ, etc.
[Caveat: pedants will note that when browser code acts on the contents of the anchor tag, some of that action will go back to the server. Thatâs not the same as finding the bare URL in your log files.]
If you have an XSS that can be triggered by code in an anchor tag, it is a âDOM-based XSSâ flaw. This means that the exploit happens primarily (or only) in the userâs browser, and no filtering on the server side, or in the WAF (a traditional, but often unreliable, measure against XSS attacks), will protect you.
When trying out XSS attacks to find and fix them, you should try attacks in the anchor tag, in the query string, and in the path elements of the URL if at all possible, because they each will get parsed in different ways, and will demonstrate different bugs.
The construction Randy uses may seem a little odd:
"-alert(1)-"'-alert(1)-'
With some experience, you can look at this and note that itâs an attempt to inject JavaScript, not HTML, into a quoted string whose injection point doesnât properly (or at all) escape quotes. The two different quote styles will escape from quoted strings inside double quotes and single quotes alike (I like to put the number â2â in the alert that is escaped by the double quotes, so I know which quote is escaped).
Surely itâs invalid syntax?
While JavaScript knows that âstring minus voidâ isnât a valid operation, in order to discover the types of the two arguments to the âminusâ operator, it actually has to evaluate them. This is a usual side-effect of a dynamic language â in order to determine whether an operation is valid, its arguments have to be evaluated. Compiled languages are usually able to identify specific types at compile time, and tell you when you have an invalid operand.
So, now that we know you can use any operator in there â minus, times, plus, divide, and, or, etc â why choose the minus? Hereâs my reasoning: a plus sign in a URL is converted to a space. A divide (â/â) is often a path component, and like multiplication (â*â) is part of a comment sequence in JavaScript, â//â or â/*â, an â&â is often used to separate arguments in a query string, and a â|â for âorâ is possibly going to trigger different flaws such as command injection, and so is best saved for later.
Also, the minus sign is an unshifted character and quick to type.
There are so many other ways to exploit this â finishing the alert with a line-ending comment (â//â or â<–â), using âpromptâ or âconfirmâ instead of âalertâ, using JavaScript obfuscaters, etc, but this is a really good easy injection point.
Another JavaScript syntax abuse is simply to drop â</script>â in the middle of the JavaScript block and then start a new script block, or even just regular HTML. Remember that the HTML parser only hands off to the JavaScript parser once it has found a block between â<script âŠ>â and â</script âŠ>â tags. It doesnât matter if the closing tag is âwithinâ a JavaScript string, because the HTML parser doesnât know JavaScript.
Part of the challenge in repeating these attacks, demonstrating them to others, etc, is that thereâs no single ad provider, even on an individual web site.
Two visits to the same web site not only bring back different adverts, but they come through different pieces of code, injected in different ways.
If you donât capture your successful attack, it may not be possible to reproduce it.
Similarly, if you donât capture a malicious advert, it may not be possible to prove who provided it to you. I ran into this today with a âfake BSODâ malvert, which pretended to be describing a system error, and filled as much of my screen as it could with a large âalertâ dialog, which kept returning immediately, whenever it was dismissed, and which invited me to call for âtech supportâ to fix my system. Sadly, I wasnât tracing my every move, so I didnât get a chance to discover how this ad was delivered, and could only rage at the company hosting the page.
Clearly, ad providers need to improve their security. Until such time as they do so, a great protection is to use an ad-blocker. This may prevent you from seeing actual content at some sites, but you have to ask yourself if that content is worth the security risk of exposing yourself to adverts.
There is a valid argument to be made that ad blockers reduce the ability of content providers to make legitimate profit from their content.
But there is also a valid argument that ad blockers protect users from insecure adverts.
Finally, if youâre running a web site that makes its money from ads, you need to behave proactively to prevent your users from being targeted by rogue advertisers.
Iâm sure you believe that you have a strong, trusting relationship with the ad providers you have running ads on your web site.
Donât trust them. They are not a part of your dev team. They see your customers as livestock â product. Your goals are substantially different, and that means that you shouldnât allow them to write code that runs in your web siteâs security context.
What this means is that you should always embed those advertising providers inside an iframe of their own. If they give you code to run, and tell you itâs to create the iframe in which theyâll site, put that code in an iframe you host on a domain outside your main domain. Because you donât trust that code.
Why am I suggesting you do that? Because itâs the difference between allowing an advert attack to have limited control, and allowing it to have complete control, over your web site.
If I attack an ad in an iframe, I can modify the contents of the iframe, I can pop up a global alert, and I can send the user to a new page.
If I attack an ad â or its loading code â and it isnât in an iframe, I can still do all that, but I can also modify the entire page, read secret cookies, insert my own cookies, interact with the user as if I am the site hosting the ad, etc.
Hereâs the front page of a major website with a short script running through an advert with a bug in it.
[I like the tag at the bottom left]
Add security clauses in to your contracts , so that you can pull an ad provider immediately a security vulnerability is reported to you, and so that the ad providers are aware that you have an interest in the security and integrity of your page and your users. Ask for information on how they enforce security, and how they expect you to securely include them in your page.
[I am not a lawyer, so please talk with someone who is!]
Malverts â malicious advertising â is the term for an attacker getting an ad provider to deliver their attack code to your users, by signing up to provide an ad. Often this is done using apparent advertising copy related to current advertising campaigns, and can look incredibly legitimate. Sometimes, the attack code will be delayed, or region-specific, so an ad provider canât easily notice it when they review the campaign for inclusion in your web page.
Got a virus you want to distribute? Why write distribution code and try to trick a few people into running it, when for a few dollars, you can get an ad provider to distribute it for you to several thousand people on a popular web site for people who have money?
Iâve mentioned before how much I love the vagaries of dates and times in computing, and Iâm glad itâs not a part of my regular day-to-day work or hobby coding.
Hereâs some of the things I expect to happen this year as a result of the leap year:
And then thereâs the ordinary issues with dates that programmers canât understand â like the fact that there are more than 52 weeks in a year. âASSERT(weeknum>0 && weeknum<53);â, anyone? 52 weeks is 364 days, and every year has more days than that. [Pedantic mathematical note â maybe this somewhat offsets the âemployerâs extra dayâ item above]
Happy Leap Day â and always remember to test your code in your head as well as in real life, to find its extreme input cases and associated behaviours. Theyâll get tested anyway, but you donât want it to be your users who find the bugs.
There are many reasons why Information Security hasnât had as big an impact as it deserves. Some are external â lack of funding, lack of concern, poor management, distractions from valuable tasks, etc, etc.
But the ones we inflict on ourselves are probably the most irritating. They make me really cross.
We shoot ourselves in the foot by confusing our customers between Cross-Site Scripting, Cross-Site Request Forgery & Cross-Frame Scripting.
â Alun Jones (@ftp_alun) February 26, 2016
OK, âcrossâ is an English term for âangryâ, or âirateâ, but as with many other English words, itâs got a few other meanings as well.
It can mean to wrong someone, or go against them â âI canât believe you crossed Fingers MacGeeâ.
It can mean to make the sign of a cross â âDid you just cross your fingers?â
It can mean a pair of items, intersecting one another â âIâm drinking at the sign of the Skull and Cross-bonesâ.
It can mean to breed two different subspecies into a third â âWhat do you get if you cross a mountaineer with a mosquito? Nothing, you canât cross a scaler and a vector.â
Or it can mean to traverse something â âI donât care what Darth Vader says, I always cross the road hereâ.
Itâs this last sense that InfoSec people seem obsessed about, to the extent that every other attack seems to require it as its first word.
These are just a list of the attacks at OWASP that begin with the word âCrossâ.
Yesterday I had a meeting to discuss how to address three bugs found in a scan, and I swear I spent more than half the meeting trying to ensure that the PM and the Developer in the room were both discussing the same bug. [And here, I paraphrase]
âHow long will it take you to fix the Cross-Frame Scripting bug?â
âWe just told you, itâs going to take a couple of days.â
âNo, that was for the Cross-Site Scripting bug. Iâm talking about the Cross-Frame Scripting issue.â
âOh, that should only take a couple of days, because all we need to do is encode the contents of the field.â
âNo, again, thatâs the Cross-Site Scripting bug. We already discussed that.â
âI wish youâd make it clear what youâre talking about.â
Yeah, me too.
The whole point of the word âCrossâ as used in the descriptions of these bugs is to indicate that someone is doing something they shouldnât â and in that respect, itâs pretty much a completely irrelevant word, because weâre already discussing attack types.
In many of these cases, the words âCross-Siteâ bring absolutely nothing to the discussion, and just make things confusing. Am I crossing a site from one page to another, or am I saying this attack occurs between sites? What if thereâs no other site involved, is that still a cross-site scripting attack? [Yes, but thatâs an irrelevant question, and by asking it, or thinking about asking/answering it, youâve reduced your mental processing abilities to handle the actual issue.]
Check yourself when you utter âcrossâ as the first word in the description of an attack, and ask if youâre communicating something of use, or just âsounding like a proper InfoSec toolâ. Consider whether thereâs a better term to use.
Iâve previously argued that âCross-Site Scriptingâ is really a poor term for the conflation of HTML Injection and JavaScript Injection.
Cross-Frame Scripting is really Click-Jacking (and yes, that doesnât exclude clickjacking activities done by a keyboard or other non-mouse source).
Cross-Site Request Forgery is more of a Forced Action â an attacker can guess what URL would cause an action without further user input, and can cause a user to visit that URL in a hidden manner.
Cross-Site History Manipulation is more of a browser failure to protect SOP â Iâm not an expert in that field, so Iâll leave it to them to figure out a non-confusing name.
Cross-Site Tracing is just getting silly â itâs Cross-Site Scripting (excuse me, HTML Injection) using the TRACE verb instead of the GET verb. If you allow TRACE, youâve got bigger problems than XSS.
Cross-User Defacement crosses all the way into crosstalk, requiring as it does that two users be sharing the same TCP connection with no adequate delineation between them. This isnât really common enough to need a name that gets capitalised. Itâs HTTP Response-Splitting over a shared proxy with shitty user segregation.
I donât remotely anticipate that Iâll change the names people give to these vulnerabilities in scanning tools or in pen-test reports.
But I do hope youâll be able to use these to stop confusion in its tracks, as I did:
âNever mind cross-whatever, letâs talk about how long itâs going to take you to address the clickjacking issue.â
Hereâs the TL;DR version of the web post:
Prevent or interrupt confusion by referring to bugs using the following non-confusing terms:
Confusing | Not Confusing Much, Probably |
Cross-Frame Scripting | Clickjacking |
Cross-Site History Manipulation | [Not common enough to name] |
Cross-Site Tracing | TRACE is enabled |
Cross-Site Request Forgery | Forced User Action |
Cross-Site Scripting | HTML Injection JavaScript Injection |
Cross-User Defacement | Crappy proxy server |
Thereâs been a lot of celebration lately from the security community about the impending death of Adobeâs Flash, or Oracleâs Java plugin technology.
You can understand this, because for years these plugins have been responsible for vulnerability on top of vulnerability. Their combination of web-facing access and native code execution means that you have maximum exposure and maximum risk concentrated in one place on the machine.
Browser manufacturers have recognised this risk in their own code, and have made great strides in improving security. Plus, you can always switch browsers if you feel one is more secure than another.
An attacker can pretty much assume that their target is running Flash from Adobe, and Java from Oracle. [Microsoft used to have a competing Java implementation, but Oracle sued it out of existence.]
Bugs in those implementations are widely published, and not widely patched, whether or not patches are available.
Users donât upgrade applications (plugins included) as often or as willingly as they update their operating system. So, while your browser may be updated with the operating system, or automatically self-update, itâs likely most users are running a version of Java and/or Flash thatâs several versions behind.
As you can imagine, the declaration by Oracle that Java plugin support will be removed is a step forward in recognising the changing landscape of browser security, but itâs not an indication that this is an area in which security professionals can relax.
Just the opposite.
With the deprecation of plugin support comes the following:
Itâs not like Oracle are going to reach into every machine and uninstall / turn off plugin support. Even if they had the technical means to do so, such an act would be a completely inappropriate act.
So, what weâre left with, whenever a company deprecates a product, application or framework, is a group of machines â zombies, if you will â that are operated by people who do not heed the call to cull, and which are going to remain active and vulnerable until such time as someone renders those walking-dead components finally lifeless.
If youâre managing an enterprise from a security perspective, you should follow up every deprecation announcement with a project to decide the impact and schedule the actual death and dismemberment of the component being killed off.
Assuming, of course, that you followed through successfully on your plan.
Until then, watch out for the zombies.
The Browsing Dead.