Better: Input Validation or Output Encoding?


OK, let me back up a little and explain.

There’s an argument I’ve been a part of at many places, with many people, as to which method of protection is best to guard against injection attacks.

What’s an injection attack?

Oh, no problem – an injection attack is something like “Cross-Site Scripting” (which I prefer to call “HTML Injection”) or “SQL Injection”.

Put simply, the attacker abuses your site’s or application’s need to collect data from users – instead of providing simple useful data, the attacker provides data that contains code along with some magical sequence or other, to allow it to execute as code, rather than merely be displayed as data.

As a simplistic example, imagine you’re writing a shopping list. On it, you write things like “Carrots”, “Potatoes”, “Steak”, etc. Then someone comes along and writes “P.S. Don’t forget to rob the bank”. Lists of items to buy at the grocery store don’t often contain instructions – but if you then give that list to a rather literally-minded, but stupid, assistant, there’s a chance the “P.S.” makes them drop out of “list of grocery” mode, and into “follow instruction” mode.

Sure, but who would be that dumb?

No person you’d send to the grocery store on their own, of course – most people have an in-built filter that prevents them from mindlessly doing stupid stuff.

Computers aren’t so fortunate, sadly – any such filter would have to be built in by the programmers. When it comes to the equivalent of reading the grocery list, the computer would have to be told specifically not to rob banks – or steal cars, murder the grocery clerks, steal from the register, or do anything else bad that might be on the list.

That’s clearly a non-starter – there are so many items in the “list of bad things you might be told to do”, it’s really much easier to start from the opposite end, and build a filter that accepts only those things that are known to be good. It can make life a little tricky, because your first run at the filter might not list all of the things that you might want to get at the grocery store – but it’s still the only way to be safe.

That’s a white-list, right?

Yes – unless you work at Microsoft, where the term is “allow list”. It’s coupled with the concept of “default deny”, a very basic security premise, which says that if you don’t know what to do, just don’t do anything. Refuse to process the grocery list, if you care to continue using the metaphor.

Input Validation

But let’s break out of the metaphor and go into technical details. What we have just described is “input validation”. Make sure that all the input that’s given you matches a restricted set of possible good inputs.

This is an easy win, and should be applied whenever possible, simply because it quickly rejects bad input in most cases. You’re asking for a quantity to purchase? Make sure it’s a positive integer – if anything other than a series of digits is given to you, refuse the value. For even more win, make sure there are no leading zeroes, and that it’s smaller than the largest expected order you can fulfill.

That’s great for such a limited case, and there are many more cases you can analyse to determine that they are indeed possible to limit.

But what happens if you need to accept a character that you know is bad, or that you don’t know is good?

How does that work with Input Validation?

Many places I’ve worked have seen me run into fanatical developers who heard about SQL Injection, and Input Validation, and determined that they needed to protect against it. Away with the word “drop” – that’s not allowed. Away with the single quote character – not allowed, either. Away with angle brackets and the word “script” in case we’re susceptible to Cross-Site Scripting (XSS) too. Probably thanks to this cartoon from xkcd.

[I’ve often thought it might be funny for there to be a follow-up cartoon, in which the school gets its revenge and assigns Bobby a GPA of 3.”>, knowing that every University web site will discard that as an attempted attack.]

Fine, but you go tell Mr O’Reilly that he can’t enter his name, or order any lemon drops; tell the French they can’t put anything in quotes (you did know that French quotation marks are double angle brackets, oui?); and as for the word “script”, I’m sure you can come up with your own examples. Start with this blog post, for instance, which uses the word “script” or “scripting” about a half-dozen times.

The pathological case of a complete inability to do input validation comes, ironically enough, in software designed to log security incidents. How are you supposed to report the exact string that triggers an XSS attack or SQL Injection, if you are afraid that the string itself will break the security incident reporting tool? [If you’ve read Douglas Hofstadter’s “Goedel, Escher, Bach, an Eternal Golden Braid”, you’ll recognise this as the record that breaks the player. If you’re under thirty you may well ask “what’s a record?”]

Quite a quandary. But you know there has to be something to solve it, and the key comes from what I said earlier – if you can’t distinguish between code and data, you may find yourself executing as code something that you should be processing or displaying or storing as data.

So you have to say “this thing that looks like code is really data”. And you have to say it unambiguously, because ambiguity means that you can’t tell between data and code.

Output Encoding

So, that’s where Output Encoding comes in.

Where Input Validation essentially says “I will only accept data that is obviously data”, Output Encoding says “I will only pass on data and code that can be told apart from each other”.

You already see the flaw, of course – to do this as the caller, I have to know how my recipient distinguishes between code and data. I have to have a contract of sorts – and developers often refer to a “code contract” – that tells me how I should tell my partner which pieces I’m sending him are code, and which are data.

There are several ways to do this, and they depend on the library / language / platform your code is running in, as much as they depend on the communication mechanism (protocol) with the partner.

Give us a simple example, then

A simple example is sending an email through SMTP – the Simple Mail Transfer Protocol. Assuming you’re the mail sender, once you’re connected and ready to go, you send the commands “RCPT TO <>” and “MAIL FROM <>” to distinguish who you’re sending mail to and from, followed by the command “DATA”.

After the DATA command, as you might expect, comes the data of the message, line by line. [This actually includes header information, but I’m ignoring that to keep this simple.]

And how does the DATA end? With a single line, containing simply a dot (full-stop, period, whatever you want to call it).

That’s great, except of course an attacker could send an email with a single dot on its own line, followed by some evil SMTP commands, and you, the mail sender, would essentially ask the mail server to execute those commands. Or someone could accidentally include a single dot on its own, and trigger some random command to execute.

Fortunately, the SMTP designers thought of that – when sending a line from an email that begins with a dot, you are required to add another dot before it. So, a dot on its own becomes two dots, two dots become three, etc, etc. And the server knows that this is a data line, and not a command to be executed.

That’s perhaps the simplest example of Output Encoding there is.

Other examples of Output Encoding would include calling HtmlEncode, or a similar function for your framework, on data which you know should not be executed as HTML. If the data is HTML-clean, the HtmlEncode function won’t touch it, but otherwise your simple call to that one function will prevent XSS attacks (aka HTML injection, remember).

Another example is when passing data to a SQL database – instead of concatenating code and data to make a string that you execute, all SQL libraries have parameterised queries, allowing you to pass data in a manner that will allow the SQL server to recognise it as data, rather than code. [There is a side issue here, in that the SQL code itself may build a command by concatenating code and data in a string – in that case, your SQL developers need a quick session with the clue-by-four.]

So, which one is better? Input Validation or Output Encoding?

Here we come back to the topic of this article.

Neither is better than the other. You have to do both.

You see, injection attacks aren’t an example of input validation failures, or an example of output encoding failures.

Injection attacks are caused by a failure to do throughput handling correctly.

Throughput handling requires that you do input validation and output encoding. Input validation is cheap, easy, understandable by all, and allows you to dismiss bad data immediately, before a server wastes its time on it. But it doesn’t catch everything, and it can’t be used in all cases. So, output encoding must be used, difficult though it may be, to ensure that the data which does seep through input validation looking like code doesn’t actually get passed to the next layer as code.

Both, then.

Yep. Good luck.

Messing around with audio files

It’s Memorial Day weekend, so we’re doing a little relaxing.

What’s relaxing for me? Playing around with interesting random bits of code.

iFetch – suddenly slow

One piece of code that’s been interesting for a while, and very useful, is the iFetch program, which I use to download BBC Radio so that I can listen to it on my MP3 player on the bus. The iPlayer is nice, and all, but I can’t access it without an Internet connection, and the bus doesn’t have an Internet connection yet.

Lately, it seems like practically every show I’m fetching is coming down in WMA format. That wouldn’t generally be so bad, except that the WMA format streams at real-time using MPlayer, whereas the other formats stream as fast as the Internet can send them.

There’s a reason why I can only download WMAs now – the BBC recently made the choice to keep to only those formats that they feel they can adequately add DRM to. Which seriously limits your options as to what devices you can play it on. (Does that iPad do Flash? No. Does it do WMA? No idea.)

I haven’t yet figured out why there’s such a slowdown with WMAs in MPlayer, so if any readers have any ideas, please let me know.

Editing MP3 tags

Since I was unable to do much about the WMA slowness, I thought I’d look into what I can do about the MP3 files I’ve already collected – organising them by genre, broadcast date, etc.

So I looked into what I can do with MP3 tags.

Windows comes with the Windows Media Format SDK, which I thought I’d use. I’ve previously used it to set various values such as Title, Author, etc. Today’s game was to try and expand on that a little. One thing I wanted to look into was the use of various date fields. Date of recording, date of encoding – those seemed to be appropriate values.

The function to use is IWMHeaderInfo3::AddAttribute, but it just wouldn’t work for me. First I tried to use the “ID3/TDRC” tag. No dice. AddAttribute gives me error 0xc00d002b – that’s NS_E_INVALID_REQUEST. So I tried ID3/TDEN – again, c00d002b. That error is supposed to mean that I entered the wrong stream index – but I used stream zero, which is supposed to mean “this tag applies to the entire file”.

Perhaps the function doesn’t accept the ID3 tag names, and only accepts the WMA tag names, even though this is strictly an MP3 file that I’m working with.

No problem.

Next to try is WM/EncodingTime, which is supposed to translate to ID3/TDEN. No longer do I get NS_E_INVALID_REQUEST. No, this time I get 0xc00d0bd7 – NS_E_ATTRIBUTE_NOT_ALLOWED. Why not allowed? No idea. Perhaps the WMF SDK (WTF SDK, sometimes) thinks that the EncodingTime should only be set by the process that does the encoding? I kind of disagree with that, and clearly because I have so many files without the EncodingTime value set, it’s not the case that it gets set by the encoding tool. I tried various different settings – as a string, as a QWORD, even as binary, and didn’t really get anywhere.

Again, does anyone know how this should work?

A couple of other Zune notes

Finally, what’s new to complain about on the Zune?

Not much – I still really really like the device itself. A couple of minor issues that I am sure Microsoft could fix if they weren’t so busy getting rid of the people in charge of the Zune:

  • Why does the Zune list Podcasts in a different order from the Zune software?
    • Why can’t you make one match the other?
    • For what I mean, try putting MP3 files on your system with these titles: “(A grand time)”, “A boatload”, “The ark”. The Zune displays them “The Ark”, “A boatload”, “(A grand time)” – the PC displays them as “(A grand time)”, “A boatload”, “The ark”.
  • A few more apps wouldn’t go amiss.
  • Seriously, next device – put Bluetooth in the box.
  • Add some cards that read “Thank you for asking – it’s a Zune, and yes, it really is this good, and it really is from Microsoft. More details at”, so that I can hand them out.

Finally got my Zune HD

If you’ve been reading this blog, you’ll have suspected for some time that I’ve been hankering after a Zune HD.

Now that I’ve changed jobs, and got my first pay cheque, my wife and I decided that it’s about time we each bought ourselves an item that we’ve wanted – she ordered a Kindle, which sadly took three weeks to arrive, and I ordered a Zune HD, which arrived a week after ordering it.

First, the good stuff.

As I remembered from borrowing my friend’s Zune HD last year, the thing is incredibly light. Same size display as the iPhone (or at least the pocket version, not the maxi version that doesn’t make phone calls even when AT&T is working), but considerably smaller and thinner and lighter.

The display is incredibly good-looking, bright and clear. I almost don’t dare put it up to maximum brightness for fear I will go blind. Perhaps I shouldn’t watch that kind of video anyway 🙂

Like pretty much every device of late, you have to convert your videos to H.264 in order to view it, and the Zune software will do this for most of your video content. Not MPG files, for some reason. However, there are several encoder tools available – I used encHD before I realised that I actually have the functionality built in to Expression Encoder.

The operation of the Zune HD is smooth and intuitive – sliding menus around on screen is simple and, as my friends who use iPhones tell me, far less of an effort than on the iPhone. “You don’t have to press so hard,” is the quote I heard from someone directly comparing the two. You can literally flick a long menu up or down to get through it more quickly. On a long list, if you want to go to the end from the top, you can slide the list down, and keep holding it down – the screen will shortly pop to the other end. Similarly, to go to the top of the list from the end, slide the list up to the top of the screen, and it’ll quickly pop to the top of the list.

In the Music list, you can make your way through your really long list by selecting one of the squares marked with a letter, at which point your view will change to that of a list of letters (and the “#” sign, for non-letter initials), which makes it easier to jump to a particular artist without scrolling.

A long-awaited feature (at least, for me) is the ability to delete an item on the device, without having to wait until you’re syncing with the app on the PC. Simply hold down your finger on the item to delete, and then touch “Delete” when the menu appears. Oh, and then you have to hit “Yes”, to indicate that you really do want to delete the item.

The new games are really cool, and my son has repeatedly worn down the batteries just playing PGR Ferrari Edition. I like the Labyrinth game, myself, which has you using the accelerometer to steer a ball through a maze avoiding holes and spikes.

It’s also handy that the Zune software accommodates syncing with two devices without getting them confused, so that I can move content across from my old player.

What features are still missing?



Vibrate, so that you can feel the little ball in Labyrinth roll into the holes.

OK, I don’t really want that, because it’d drain the batteries like crazy and make the device bigger and heavier. Do. Not. Want.

Er… that’s it.

But there’s bad stuff too, right?

Sure there is.


Like before, it really doesn’t take into account my use of audio.

I have a number of programmes I’ve recorded from the radio, each of which might be 30 minutes, some of which are up to three hours in length. I can’t put those in the Music folder, because if I do, there’s no way to listen to part of the show, then part of another show, and then come back to the show I first listened to, because each time you play a different item in Music, you get to start from the beginning.

So I’ve had to put these programmes into the Podcasts directory – and that requires that I set their “Genre” tag in the MP3 file to “Podcasts”. That’s not a huge issue, but there are some things you lose by doing this. First, of course, by setting the Genre tag, there’s no way to sort by Genre. Bummer.

Then there’s the insane issue that the image stored in the MP3 file is not displayed in the list, or when playing the “Podcast”. While it’s clear that images are supported in MP3 files in the Music folder, and images associated with Podcasts are supported, the two are not combined. Irritating.

There’s also no way to determine, from the PC, which podcasts you’ve listened to on the Zune, so there’s no good automated way to delete, or move, the Podcast files from the PC that have been listened to. You just have to do it manually. Not cool.


While you can certainly delete items now, the support is not that wonderful.

From the Music menu, you can delete an album, but not a song, unless you go through to something other than the Albums list.

In the Pictures menu, you can delete an individual picture, but not a folder. Makes it rather difficult to remove your ‘personal’ pictures before handing the Zune off to a friend for a demo. And, let’s be honest, this thing’s so good you’re going to spend the first month of ownership handing it to people for a demo.

In Podcasts, you can delete groups of Podcasts as well as individual Podcasts.

In History, you can’t remove items from the History. Why would you want to?


Scrolling from top to bottom is a little awkward – you have to very carefully find and grab the top of the page and slide it all the way to the bottom, not letting go until you get there. There’s no way to ‘throw’ the top down to the bottom and reach the bottom of the page.

Scrolling through Podcasts [remember? that’s where most of my good stuff lies?] is limited to up and down only. Unlike the Music folder, there’s no ability to scroll through the alphabet. The episodes in the Podcast list are still sorted in a manner I can’t completely fathom, but the primary sort key seems to be the date of the file on the hard drive of your synced PC.

In the Internet app (the browser), there’s no ability to scroll from the top to the bottom, nor can you quickly scroll massive distances – which you might want to do if you find yourself on a site like this blog, which doesn’t display well on the Zune.

Summing that all up.

The good stuff far outweighs the bad stuff.

The Zune HD is a seriously wonderful device. Light and compact, it fulfils exactly the purposes I had in mind for it. Obviously there needs to be a phone-based version as well, but those are either here or on their way.

Look at the items in my “good stuff” list – they’re all ‘architectural’ components – basic concepts of the design of the app. Now look at the items in my “bad stuff” list. They’re all small winces that happen at the corners. “Fit and finish”, if you like, or a “simple matter of programming” to fix. I just hope that someone at Microsoft reads my blog and does something about it.

From the perspective of the Podcasts and syncing up and down, I’d be happy if Microsoft would just introduce an SDK that allows me to enumerate the files on the device, and whether or not they’ve been played.

It doesn’t have a fruit on the back.

I deliberately bought a Zune HD rather than an iAnything. If you read through my previous entries on the blog, you’ll find that Apple have repeatedly given me a poor computing experience that fails to jibe with the expectation that I’m given by the man in the black turtleneck and his dedicated followers. Yes, I’ve seen your Mac, and I’ve seen your iPhone, and it works far less the way I do than any other device I’ve used.

I have a phone that makes phone calls, connects to a Bluetooth earpiece, and receives text messages – and that’s currently all it does. I only charge it once a week. iPhone users that I know find themselves frustrated with the quality of Apple’s customer service, as well as the overall phone service they get from AT&T, and they charge their iPohne constantly. Not even nightly – whenever they are stationary for more than five minutes in a room with a power outlet. At least, that’s my impression.

I might consider getting a Windows Phone 7 when it does arrive, simply because I really like the Zune interface, but I suspect that I may simply remain more comfortable with a separate phone, and not buy the Phone 7, or zPhone, or whatever they wind up calling it (seriously – a dorky name like “Windows Phone 7”? The manufacturers are going to name theirs something cooler).

It needs apps.

OK, here’s where the iPhone / iPod wins out in some measure, except that Apple controls the apps you can put on your iPhone / iPod, and Microsoft lets you put any old crap on the Zune – apparently, it’s actually your Zune, not Microsoft’s. But there are relatively few titles out there, and most people will continue to get their apps from the Microsoft Zune Marketplace, rather than from third party shareware web sites.

Everyone says “oh, that’s cool”

Even the iPhone users I’ve shown the Zune too have noticed something about it that is cool. Most often, they say it’s light in weight, the screen is bright, you don’t have to poke the screen as hard to make it work, and it fits their hands better because it’s smaller.

That’s a takeaway Microsoft can be proud of.

But if they’d fix the little winces I note above, I’d be even happier. 🙂