Ways you haven’t stopped my XSS–Number 1, JavaScript Strings

I saw this again today. I tried smiling, but could only manage a weak grin.

You think you’ve defeated my XSS attack. How did you do that?

Encoding or back-slash quoting the back-slash and quote characters in JavaScript strings

Sure, I can no longer turn this:

<script>
s_prop0="[user-input here]";
</script>.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }


into this, by providing user input that consists of ";nefarious();// :



<script>
s_prop0="";nefarious();//";
</script>
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, “Courier New”, courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }


Instead, I get this:



<script>
s_prop0="\";nefarious();//";
</script>
.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, “Courier New”, courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }


But, and this surprises many web developers, if that’s all you’ve done, I can still close that script tag.



INSIDE THE STRING



Yes, that’s bold, italic and underlined, because developers see this, and think “I have no idea how to parse this”:



<script>
s_prop0="</script><script>nefarious();</script>";
</script>


Fortunately, your browser does.



First it parses it as HTML.



This is important.



The HTML parser knows nothing about your JavaScript, it uses HTML rules to parse HTML bodies, and to figure out where scripts start and end.



So, when the HTML parser sees “<script>”, it creates a buffer. It starts filling that buffer with the first character after the tag, and it ends it with whatever character precedes the very next “</script>” tag it sees.



This means the HTML above gets interpreted as:



1. a block of script that won’t run, because it’s not complete code and generates a syntax error.



s_prop="


2. a block of script that will run, because it parses properly.



nefarious();


3. a double-quote character, a semi-colon, and an unnecessary end tag that it discards



Obviously, your code is more complex than mine, so this kind of injection has all kinds of nasty effects – but it’s possible for an attacker to hide those (not that the attacker needs to!)



So then, the fix is … what?



If you truly have to insert data from users into a JavaScript string, remember what it’s embedded in – HTML.



There are three approaches:



  1. Validate.
    If at all possible, discard characters willy-nilly. Does the user really need to input anything other than alphanumeric characters and spaces? Maybe you can just reject all those other characters.
  2. Encode.
    Yeah, you fell afoul of encoding, but let’s think about it scientifically this time.
    What are you embedded in? A JavaScript string embedded in HTML. You can’t HTML-encode your JavaScript content (try it and you’ll see it doesn’t work that way), so you have to JavaScript-string-encode anything that might make sense either to the HTML parser OR the JavaScript parser.
    You know I don’t like blacklists, but in this case, the only characters you actually need to encode are the double-quote, the back-slash (because otherwise you can’t uniquely reverse the encoding), and either the less-than or forward-slash.
    But, since I don’t like blacklists, I’d rather you chose to encode everything other than alphanumeric and spaces – it doesn’t cost that much.
  3. Span / Div.
    OK, this is a weird idea, but if you care to follow me, how about putting the user-supplied data into a hidden <span> or <div> element?
    Give it an id, and the JavaScript can reference it by that id. This means you only have to protect the user-supplied data in one place, and it won’t appear a dozen times throughout the document.


A note on why I don’t like the blacklists



OK, aside from last weekend’s post, where I demonstrated how a weak blacklist is no defence, it’s important to remember that the web changes day by day. Not every browser is standard, and they each try to differentiate themselves from the other browsers by introducing “killer features” that the other browsers don’t have for a few weeks.



As a result, you can’t really rely on the HTML standard as the one true documentation of all things a browser may do to your code.



Tags change, who knows if tomorrow a <script> tag might not be “pausable” by a <pause>Some piece of text</pause> tag? Ludicrous, maybe, until someone decides it’s a good idea. Or something else.



As a result, if you want to be a robust developer who produces robust code, you need to think less in terms of “what’s the minimum I have to encode?”, and more in terms of “what’s the cost of encoding, and what’s the cost of failure if I don’t encode something that needs it?”

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>