Life has become considerably easier for developers over the years, particularly with the advent of managed code (or whatever the equivalent terminology is for Java). Memory usage is something which one only needs to be aware of rather than constantly being "in your face" in the way it tends to be in C. However, that doesn't mean that all is rosy, or that we can solely concentrate on actual business problems. I thought it might be worth a quick run-down of the problems I tend to find getting the way of more interesting work. In every case things have become a lot simpler than they were a while ago, and in many cases there are promising new technologies or research efforts underway to improve the situation further. To be honest, I doubt that any of those improvements will be enough to remove the relevant item from the list. I expect that if I come back in five years or so, the list may well be largely the same. So, have I missed any "biggies"? Am I shockingly stupid for regarding any of these as "hard"? I haven't included user interface design in here, partly because I have so little experience of it. We seem to keep changing our minds as an industry about what's "good", and we still seem to keep coming up with UIs with fundamental problems like not being properly resizable, so I guess we haven't cracked it yet – but I don't think I can give much insight into the actual problems. Anyway, on to the list…
Installation and updates
Installation is one of those things which tends to get forgotten until near the end of a development cycle – or at least, it's rare to get it right until you're about to ship. To some extent, this is due to the fact that it relies on knowing exactly what will need to be installed. In an ideal world, installation should be very simple in terms of being totally transactional, so that if anything goes wrong it can be rolled back reliably. If you're just adding files to the local file system, that's not too far from reality – but for many types of installation there's a lot more involved. What if you need to install a new database schema, and the database goes down after you've done that part of the installation but haven't yet finished the other parts? What if installing the application requires registering on a remote server? Basically, as soon as anything other than the local box is needed, it's hard to absolutely guarantee a clean rollback. Installation is often given to junior engineers and regarded as a less prestigious part of the project to work on, but it's absolutely crucial in terms of customer satisfaction and system stability.
Updates can be even worse – you may need to repair a "broken" system, maintain the customer's configuration from the previous installation, notice any "customisations" they have made to the previous installation, possibly upgrade from multiple versions with one installer, etc. Rollback of an unsuccessful upgrade is even trickier than normal installation, as you'd ideally want to roll back to the previous system state – an upgrade almost never involves just adding files, as you usually want to replace previous components.
Finally, installation can be a very platform-specific area to work in. Even if you're only installing on Windows, there are "gotchas" for each edition – and then you need to potentially check that the right service packs have been installed, etc. When you come to cross-platform installation, life is even worse. Checking that any dependencies are installed (and in the way you expect them to be), making your application available in the appropriate way for that system, integrating with whatever installer services are the norm – it's enough to drive a person crazy.
Tied in with installation is versioning across communicating systems. I've only recently had to deal with this – it's certainly not something that all developers are likely to need. When you do need it, however, it's a pain. Suppose version 2 of your application needs to be able to talk with version 1 and vice versa. Undoubtedly v2 will have features that v1 doesn't support, and it may implement the v1 features in a slightly different way. The details of what is communicated in what situation are tricky to get right. This is one of those problems which isn't too hard to handle for any particular small case, but the difficulty lies in being rigorous in the definition of what you're allowed to do without a component (or whatever you use as your unit of versioning) needing to really change version. You may be able to add some data, using a default when it's not provided, but not change a method signature, for example. Likewise, depending on what technology you're using for the communication, you may need to lay down rules about exactly how data is sent between the systems. Once those rules have been precisely defined, you then need to be utterly meticulous about sticking to them. Following rules is tedious, and all developers can be forgetful on occasion.
Oh, and then there's the testing, of course. Do you support a connected system which includes two installations of v1, one installation of v2, and one installation of v3? What if for v3 you've decided to drop some of the v1 functionality? When writing v1, you need to be aware of future possibilities so you can handle them cleanly. The principle of YAGNI is less applicable here than normal, because we can't accurately predict the future. YAGNI is fine when you can implement a feature later on, but it's less useful when you don't get to force all your customers to upgrade all their systems. While you don't need to predict everything you'll implement later, you may well need to build features in now to accommodate changes later on.
Up-front warning: I'm not an expert on i18n. That's one of the problems – there are very few people who are.
I know what Unicode surrogates are, and I know that very little of my own code handles them properly. I know
a bit about some of the more common encodings available. I know of little gotchas like the capitalised form
of "i" not being "I" in Turkish (having been bitten by that one in a previous job) – that when you consider
some manipulation of text data, you'd better know whether it should be done in the system locale, the user's locale,
the database's locale, a different specific locale, whatever. I know that a UI designed without taking into
account that labels will take up different widths in different languages is likely to fall flat when it's
localised. I know that repeatedly replacing " " (two spaces) with " " (one space) until you can't find " " (two spaces) any more can lead to
an infinite loop in .NET, as
String.Replace treats zero-width characters differently to
These are just wrinkles I've come up with off the top of my head – I'm sure I could think of plenty more if I wanted to provide a longer list. All of that is without being in any way an expert. Goodness knows what bizarre stories someone genuinely knowledgeable could tell. Now, although I'm not an expert, I'm reasonably intelligent. I don't expect all the other developers on my team to have all the expertise I'm missing. Heck, I don't expect there are that many projects which have even a single genuine i18n expert. Even if they did, that expert would have to review virtually all the code of the project: how many classes do you write which really don't have any text manipulation? It's not just text which ends up in front of a user which you need to be careful with…
I strongly suspect that almost all applications are broken to a greater or lesser extent when it comes to i18n. How many validation routines take surrogates into account? How many servers have the same bug which we happened to find and fix, trying to do a case-insensitive comparison of header names by upper-casing the name in the system locale? How many systems are going to correctly handle sorting characters in Japanese text, taking the kana into account? I don't know whether it would be more depressing to think I was just singularly incompetent, or whether it's worse to believe that everyone else is just as ignorant of these issues as I am.
Date and time handling
I've broken this part of i18n out into its own topic because it's so nasty even if you only have one culture to deal with. When do you use UTC, and when do you use a local time? How easy is it to get at the local time? When should you use the local time of the user, and when should you use system times? What about daylight saving times, which can lead to some local date/time combinations being ambiguous and others being impossible? How do you gracefully cope with the system time changing abruptly?
There are four core problems here, as far as I can see. Firstly, there's working out what to do in any particular situation. Sometimes the answer is obvious, and we've learned a lot over time about best practices – keeping date/times in UTC as long as possible, for instance. In other cases the answer is harder to work out, or different users may have different goals or expectations, leading to no one correct solution.
The second problem – one which is often ignored – is communicating your decisions. Agreeing on some terminology can help, but everyone needs to be willing to take a bit of time to internalize the "rules". This is the case in many areas of development, but I've found that date and time handling tends to be particularly tricky, just because unless you're really precise about what you mean, different people will interpret your words in different ways. The more actors in the system, the worse it gets: if you're considering the situation where you have a server in Australia administered by someone in London, with a helpdesk operator in Germany answering a call from someone in France, the chances of everyone agreeing on exactly when something happened are really slim.
Thirdly, the commonly available libraries are pretty rubbish at the moment. Java allows you to do the right thing, but
because it's taken several goes to get it right, there are deprecated methods everywhere. The decision to make months
0-based makes sense in some ways, but catches pretty much everyone out sooner or later, and can make tests harder to read.
The precise behaviour of calendars in terms of setting/adding/rolling different inter-related fields is fairly precisely
defined, but not easy to understand. There's no simple access to the UTC timezone without magic strings. The use of inheritance
java.sql.Timestamp can lead to tests
failing unexpectedly. At least you can generally do what you want though, with a bit of work – .NET is far worse in this
respect. In 1.1, there was no way of knowing whether a
DateTime was local or UTC; calling
repeatedly would keep changing the time by the timezone offset on every call. In .NET 2.0 there's the
property which helps, but it's a bit of a sticking plaster. There's still no way (in the framework itself) of getting at the
system's list of timezones. I dare say this will improve over time, but I can't see why it's taken so long to get even this far.
I'm sure there are smart people at Microsoft who know the kind of thing required for writing applications which will be have users
in different timezones to the system itself – why weren't they more involved in designing the API?
Fourthly, there's the real world, where politicians may arbitrarily decide to change daylight savings etc. There has been talk about changing Britain's timezone to be one hour ahead of where it is now. How would that affect all the software in the world? How many systems would need to know about the change? How would they all find out about it? It feels to me like the same kind of scale of change as altering the currency of a country – possibly worse, as there are lots of applications which deal with times but don't ever need to deal with money.
I said at the start of this article that memory handling wasn't much of an issue now if you're using .NET or Java. You need to be slightly careful to make sure you don't have orphans due to events, static members etc, and you need to be aware of what's going on in order to avoid making gross inefficiencies for no reason, but most of the time you don't need to worry about things. This isn't the case when it comes to other resources, such as file handles, network connections, etc.
I've read posts by C++ developers who maintain that C++ has effectively solved the situation with
auto_ptr. I don't
know enough about C++ to say to what extent this is true, but in .NET and Java, without deterministic finalization (for
pretty compelling reasons, in my view) you still need to handle non-memory resources manually. Now, C# provides the very
using statement (which I deeply miss when working in Java) to make life easier, but there's still the
manual element of making sure you always use it in the right place. There's still got to be a sense of someone "owning"
the resource, and that object (and only that object) releasing it, and nothing else trying to use the released resource
reference afterwards. A good example of this problem is when creating an image from a stream in .NET. Whenever I use a
stream in .NET, I habitually start wrapping it in a
using statement – but if I'm providing the stream to
Image.FromStream, I have to notice in the documentation that I've got to keep the stream open for the lifetime
Image. The documentation doesn't make it clear whether or not disposing the image will close the stream
for me. Furthermore, making the transfer of ownership from the calling code to the image atomic is far from straightforward.
This is the area in which I have the most hope for the future. Possibly the successor to .NET will have resource clean-up all sorted out. I dare say it'll be a long time coming, but I still plan to be developing in 15 years' time. I hope at that point I can look back and shake my head at the hoops we have to go through today.
Increasingly, developers need to know about threading. Gone are the days where most developers could rely on their application being the only one running on the box at the time, and it being okay to just make the user wait for a while if a time-consumering operation was required. Like i18n, I'm not a threading expert. I probably know more about it than most developers due to investigating it more (I find the whole business fascinating) but that doesn't make me an expert. I've tried to write about the topic in an accessible way, but there are huge areas I haven't written about, simply because I don't know about them. Every so often I'll come across an optimisation I wouldn't have thought would be valid which could call into question code I thought was reasonably safe. So, to start with, there's a lot to know.
Then there's a lot of care to be taken. In some ways, avoiding deadlocks is straightforward: keep locks for as short a time as possible to avoid contention, and if you ever take out more than one lock, make sure that the code paths that will take out those multiple locks always acquire them in the same order. The reality of implementing that strategy is much harder than it sounds, in my experience – certainly when the system gets large.
Then there's the technology side of things – the facilities provided to us by the platform we're working on. These have improved by leaps and bounds over the years, and things like the CCR sound like they'll make life easier. All I'd say is that we're not there yet. While every call from a background thread to a UI thread needs some manual coordination, there's still work to do. One problem is that to get things right, you tend to need to know a certain amount of what's going on under the covers: while I expect life to get easier for developers, I think they'll still to understand a bit about tricky things like memory models and the strange optimisations that are permissible.
Exceptions are lovely. I generally agree with Joel Spolsky, but I completely disagree with his view on exceptions. That's not to say he doesn't make some good points, but I consider his solutions to the problems of exceptions to be worse than the problems themselves. Returning error codes has proved to be a dangerous way of working – it's far too easy to forget to check the code. The equivalent with exceptions is catching an exception and then ignoring it – and that happens in real code, far more often than it should, but at least it requires actual code to do the wrong thing.
So, why is error handling still in my list? Because we haven't become good at using exceptions yet. We still find it tricky at times to know the right point to catch an exception, and in what rare circumstances it's right to catch everything. Also, there's more to error handling than just exception handling. How forgiving should we make our systems? How do we report errors to the user? How do we give users error information which is precise enough for our support team but which doesn't scare the user to death? Oh, and how do we educate developers not to catch exceptions and ignore them without having a really good reason?
Part of this may be technological. Java tried an experiment with checked exceptions, and although I was a fan of them for a few years, I've changed my mind over time. I think the experiment was worth trying, and there were some benefits that ought to be captured by the Next Big Thing, but the overall effect wasn't all it could be. I'm not smart enough to come up with the Next Big Thing myself, but I'm hoping it will improve reliability without giving the developer more grief.
If the previous topic was a bit ropey, this one barely made it on the list at all. It should definitely be on a list, however, and the link is tenuous but just about visible, so it can live here for the moment.
I've commented before how CVs these days have shopping lists of technologies on them. Regardless of how accurate those CVs are, the technologies themselves certainly exist and are being used by someone, somewhere. Just take one topic: XML, for example. How many XML APIs/technologies do you know? How many more do you know of even if you haven't used them? Here's a list off the top of my head, without reference to the net:
DOM, SAX, JDOM, dom4j, Xerces, Xalan, STaX, MarkupBuilder (and related), XPath, XQuery, XSLT, xpp3, Jaxen, JAXP, XmlReader (and related), Xstream.
Yikes! Just keeping up with all the XML APIs would be a full-time job, and that's just XML! Trying to stay on top of the standard libraries of both .NET and Java is equally tricky. How is anyone meant to cope? My personal answer is to focus on the technology I need to solve the problem at hand, but to try to keep an ear to the ground to at least have a passing awareness of interesting things I may want to use in the future. It's impossible to gauge how successful I am at that, but I know that it's a time-consuming business, and I see no sign of the software industry slowing down. Don't think I'm not grateful for all the work that these technologies save me – I'm just recognising that the variety available comes with a penalty.
Here in 2006, life is still tricky in software development. From a career point of view, that's a good thing – I'm pretty good at what I do, and if everything became trivial, I guess I wouldn't have as much employment value. On the other hand, some of these problems have been with us a long time and we're making lamentably slow progress towards making them no-brainers. Someone remind me to come back to this list in 2011…