I teach that XSS is prevented absolutely by appropriate contextual encoding of user data on its way out of your application and into the page.
In the case of HTML attributes, it’s actually fairly simple.
Unless you are putting a URL into an attribute, there are three simple rules:
Seems easy, right?
This is all kinds of good, except when you run into a site where the developer hasn’t really thought about their encoding very well.
You see, HTML attribute values are encoded using HTML encoding, not C++ encoding.
To HTML, the back-slash has no particular meaning.
I see this all the time – I want to inject script, but the site only lets me put user data into an attribute value:
<meta name="keywords" content="Wot I searched for">
That’s lovely. I’d like to put "><script>prompt(1)</script> in there as a proof of concept, so that it reads:
<meta name="keywords" content=""><script>prompt(1)</script>">
The dev sees this, and cuts me off, by preventing me from ending the quoted string that makes up the value of the content attribute:
<meta name="keywords" content="\"><script>prompt(1)</script>">
Nice try, Charlie, but that back-slash, it’s just a back-slash. It means nothing to HTML, and so my quote character still ends the string. My prompt still executes, and you have to explain why your ‘fix’ got broken as soon as you released it.
Oh, if only you had chosen the correct HTML encoding, and replaced my quote with “"” [and therefore, also replace every “&” in my query with “&”], we’d be happy.
And this, my friends, is why every time you implement a mitigation, you must test it. And why you follow the security team’s guidance.
Exercise for the reader – how do you exploit this example if I don’t encode the quotes, but I do strip out angle brackets?