Update (19th March 2009)
Debate around this review is getting heated. I stand by all the points I make about the text, but I’d like to clarify a few things:
- If there are any ad hominem comments in the review against the author, please ignore them. I’m going to try to weed out any that I find, but if you spot one, please let me know and then ignore it. I feel very strongly that a review should be about the text of a book, not about its author. The text is what will inform the reader, not the author’s other work. I’m aware that Gastón Hillar has written many other books, and is generally well-regarded (as far as I can tell, anyway). That neither helps nor hinders the text. The same goes for me and the review, of course. Whether you know me or not, whether you’ve read anything else I’ve written or not, the review should stand on its own merits. This is not a popularity contest – it’s a discussion about a technical book.
- The impression I’ve given in the review is almost entirely negative. This is because that’s the impression I received as a reader interested in accuracy and best practices. That does not mean that the book is entirely inaccurate – far from it. There are plenty of aspects where I have no particular issues with the accuracy. (The code style is more uniformly disagreeable to me, but that’s a subjective matter.) However, there is enough inaccuracy (and bad practice, in my view – somewhat subjective, but less so than the code style) to make that the dominant impression left with me, alongside my surprise that there’s no proper discussion of locking. As an analogy, imagine you go to a choral concert. Suppose the sopranos, altos and tenors are all perfectly in tune, but the basses are out of key the whole time. In some senses the concert would be 75% accurate – but the 25% inaccuracy would be enough to ruin it. So it is with technical books (not just this one) – it only takes a relatively small degree of inaccuracy to make the difference between a good book and a bad one. The bottom line is: even I don’t think everything or even most of what’s written in the book is wrong; there are enough problems to make me dislike it though.
- I’ve made a few minor edits to the review just now, to address a few comments made so far. If some of the comments appear to be odd, that may be why!
- Publisher’s page (Packt) – this is the cheapest way to buy the book as far as I can see
- Sample code (49MB download! Mostly because it contains bin/obj directories for all solutions…)
- Amazon or Barnes and Noble links if you don’t want to buy it directly
- John Mueller’s review – a much more positive review than this one, which may prove an interesting counterbalance for readers. (Thanks to Erik for pointing out John’s review in the comments.)
This book doesn’t really compete with C# in Depth, but obviously the very fact that it’s another book about C# at all means I’m probably not entirely unbiased. Arguably it also “competes” with my own (somewhat out of date now) threading article, although that’s not a monetary venture for me. I should also point out that my copy was sent to me for free, specifically for review, by Packt Publishing.
Audience and content
The book claims that “Where you are a beginner to working with threads or an old hand who is looking for a reference, this book should be on your desk.” In practice, I don’t think it’s really suitable as a reference. The kind of information you really want as a reference is hard to find amidst the bulk of the book, which is on-going examples. For the rest of this review I’ll regard the intended audience as just beginners.
The first chapter (out of 12; at 388 pages one of the nice things about this books it that it’s relatively slim) is introductory material about threads and processes, and why concurrency is important in the first place. After this one code-free chapter, the rest of the book is all example-based. The pattern goes something like this:
- Give rough idea of what we’re building
- Create first version of the code
- Explain what it does and why it may not be ideal
- Improve it
- Explain how the improvements work
- Move to next example or add new major feature
That sounds all very well, but I’ll get to my issues with it in a minute. Although the examples are constantly evolving, they essentially break down into these applications:
- “Code cracking” (brute-forcing a 4 character code)
- “Encrypting” SMS messages (not real encryption – no key – but a general CPU-intensive transformation)
- Image processing to find and highlight “old stars” in NASA images
- “Encrypting” several files
- More image processing – adjusting the brightness of a large image and thumbnailing it
These are all Windows Forms applications, and are frankly pretty similar, all basically dealing with simple, embarrassingly parallel tasks. That’s not to say Hillar doesn’t get a fair set of different techniques and lessons out of them:
- Keeping the UI thread free (and seeing what happens when you don’t)
- Tips for debugging multi-threaded apps in Visual Studio
- Showing the performance for individual processes using Task Manager and Windows Explorer
- Using BackgroundWorker to update the UI
- Queuing tasks in the system thread pool
- Creating new threads explicitly
- Using Control.Invoke/BeginInvoke to update the UI (although this comes very late in the book – chapter 10 out of 12)
- Keeping tasks independent
- Noting that sharing data between threads is difficult – but coming to the wrong conclusions (more later)
- Using the Timer component (just the WinForms timer; not System.Threading.Timer, System.Web.UI.Timer or System.Timers.Timer) – although later on he uses a BackgroundWorker for a task much more suitable for a Timer.
- A bit of OO design, although in a pretty botched way – the idea of having a general-purpose “parallel algorithm” class and a “parallel algorithm piece” class is reasonable, but it isn’t handled nearly as well as it might be
- Fairly disastrous advice (IMO) about both I/O and the GC
- Exception “handling” (where “swallowing exceptions and just reporting them with Debug.Print” counts as “handling” apparently)
- Parallel Extensions from .NET 4.0, with both PLINQ and TPL
Unfortunately, this misses out some of the most important concepts in parallelism on .NET. Hillar frequently mentions locking, but only ever in a “we’re avoiding doing it” way. I find it absolutely incredible that a book on multi-threading in C# doesn’t even mention the “lock” keyword. Okay, it’s nice to be able to split tasks up completely independently where possible, but in the real world you sometimes have to use shared mutable state (or at least, it’s often the simplest approach).
When I first got the book, I looked up several entries in the index to see how they’d be handled. I was shocked to find that none of these have an index entry:
- memory model
- Wait or Pulse
- BeginInvoke or Invoke
- double-checked locking
- mutable or immutable
The concept of accessing state from multiple threads is glossed over for the entirety of the book. Basically whenever multiple threads want to make their results available, they put them in different elements of an array or list. There’s an assumption that if you read from that array/list in a different thread, it’s all okay. Likewise there’s an assumption that it’s appropriate to read integer variables written to in one thread from another thread without any locking, volatility or use of the Interlocked class. I’ll come back to this topic when I tackle accuracy later on.
This is a very informal book: something I have no problem with. English clearly isn’t the author’s first language, and although I don’t blame him for some of the clumsy wording in the book (e.g. “We will not leave behind the necessary pragmatism in order to improve performance within a reasonable developing time”) I do wish the book’s editorial team had done a better job in that respect. It’s tricky with technical books: non-technical editors have good reason to be wary of going too far, as small changes in wording can have make a large difference semantically, but it does make a big difference to a book’s readability when the language is clear and idiomatic. (As a side note, I feel incredibly fortunate to have English as my native tongue. I’m not fluent in any foreign languages, and I’m often amazed at how well others manage.)
There are other elements of the style of the book which I have much more of a problem with. The first is the way that the examples are handled. A very large proportion of the book is just lists of instructions: “add some using directives: <code>; add these variables: <code>; add this procedure: <code>; add another procedure <code>; add an event handler <code>” with just a sentence or two of explanation for each one as you go. There’s much more explanation after all the code has been added, but the way that the code is given makes it very hard to see what’s going on. We almost never get to see a whole class in one listing – it’s always broken up into using directives, variables, individual methods etc. This may not be too bad if you’re following along with the book at every single point, but it makes it very hard to just read. As a friend has commented, this content might work a lot better as a screencast, rather than as a book.
One detailed gripe: nearly every time a property is introduced, the author uses the phrase “we want to create a compact and reliable class” as a justification. There’s no explanation, and quite often the properties are mutable for no good reason (when a genuinely reliable class in a multi-threaded setting would be immutable). After a while it made me want to grind my teeth every time I saw it.
The feeling is very much that of a Head First book, but one which doesn’t work. For all my misgivings about Head First C# (which I believe is now very much better now that a large number of errors have been removed) the general style was very well handled. It’s not my preferred style to start with (particularly focusing on large GUIs instead of short, complete console apps) but I rarely felt particularly lost in the listings – there was usually enough context to hold onto. Here, I feel there’s very little context at all. If you accidentally miss out a step, you’ll have a really hard time working out which one it is or what’s wrong.
On top of this, there’s the bizarre storyline “explaining” all the listings. Apparently you (the reader) originally started out cracking a code, then got hired by some other crackers, then the FBI, then NASA. We are told of FBI agents getting us capuccinos, the NASA CIO wanting you to use the Parallel Extensions CTP so that they can get free licences for Visual Studio 2010 and all kinds of other oddities. We are constantly bombarded with plaudits about our threading capabilities – by the last chapter we’re regularly being called “experts” and “threading gurus” despite the fact that we wouldn’t have a clue what was going on if someone presented us with some code using a “lock” statement. This is all patronising in the extreme – and again, Head First C# (and I suspect the rest of the Head First series) handles the “keep it informal but drive the topic forward” aspect a lot more successfully.
Finally, on the topic of style, I’d like to rant a bit about the coding style. It’s awful. Really awful. I realise that coding standards are to some extent a personal thing, but I object to code like this:
- Pseudo-Hungarian (the type which uses “o” as a prefix for almost any object; not the type Peter likes) and the nature of every variable (local, parameter or instance variable) makes for horrendous variable names such as “prloOutputCharLabels”. It’s not even consistent – variables added by the designer only get a type designation prefix (lbl, but, pic) but no nature prefix. Aargh.
- Methods are frequently camel-cased instead of Pascal-cased, e.g. “showFishes” and “checkCodeChar”. It’s possible that this is only true for private methods – a very quick flick through doesn’t reveal any public ones like this – but if so it’s inconsistently applied as there are certainly Pascal-cased private methods too. Some public properties combine both annoyances so far, with names such as “poThread” and “piBegin”.
- Most (but not all) of the time Hillar declares all of a method’s variables at the top, even if they’re not used for a long time. This includes declarations of variables for use in loops. This took me right back to the 80s, writing ANSI C again. I believe that the ability to declare variables at the point of first use gives a significant improvement in readability. It’s easier to see where a variable will be used if its scope is limited, for example.
- Using directives aren’t applied nearly thoroughly enough, leaving lots of explicit use of System.Diagnostics, System.Drawing, System.ComponentModel etc. Given the line length limitations in printed books, this is a real killer in terms of providing compact, readable code.
- Speaking of line length limitations, it would be really useful to actually acknowledge them – if a comment is going to span two printed lines, starting just the first one with “//” and leaving the second indented but not really a comment isn’t a good idea.
So, we’ve got code broken up into chunks which breaks the flow of the code, and I don’t even like the style of the code. Still, I could live with that if it’s good quality code…
Accuracy and best practices
I’ve already indicated one of the significant problems I have with the book in terms of content: its complete absence of discussion about shared data and locking. Yes, this is a beginner’s book, and I wasn’t expecting the level of detail on the memory model which is present in Joe Duffy’s book (which I promise I’ll review soon) – but I’d certainly prefer to err on the side of safety. The book regularly just accesses data on one thread having written it on another, with no locking, volatility or use of Interlocked. This isn’t the sole bad practice, however, and it’s not limited to stylistic choices either. In the course of the book, we are told all of the items below (and more). Italics indicate what the book claims; regular type indicates my response. These aren’t verbatim quotes, but paraphrase:
- Forcing garbage collection before starting a multi-threaded operation is a good idea. This is given as a sort of response to a screenshot of Process Explorer showing ugly memory usage. In fact, I can’t reproduce the kind of nasty graph that’s shown in the book, even with the code downloaded from the web site, but if I did see that there are definitely better ways of addressing it than forcing garbage collection. Disposing of Bitmaps appropriately would be a good start… as it is, each bitmap is going to hang around for at least one garbage collection cycle longer than it should, because we’ve got to wait for its finalizer to be executed. Making sure you dispose of objects appropriately is always a good idea – explicitly forcing the garbage collector is almost always a bad one. (Not absolutely always, but usually.)
- WaitHandle.WaitAll has to run on an MTA thread – so let’s just change the [STAThread] line above the Main method to [MTAThread], with no warning that it’s a really bad idea to do that for Windows Forms. (Side-note: when trying to check that there really isn’t a warning, I had to spend a long time finding the section. The index doesn’t contain entries for MTAThread, STAThread, apartment, WaitHandle or WaitAll. In general the index could do with a lot of work. I’m painfully aware that indexing is a horrible task, but it’s important.)
- Application.DoEvents() is a way of letting the UI process events. This is true – it’s also another really bad idea unless you absolutely have to use it. Re-entrancy is hard to debug – and not mentioned at all in the book, as far as I can tell.
- Data streaming is wasteful, because two threads might both want to do I/O at the same time – it’s a better idea for each thread to load all the data it needs to and then start processing it. This is stated in a context where streaming is ideal – each thread just needs to process every line in a file. (Each thread is asked to process a different file.) There’s no dependency between the lines of the file. It’s an absolute gift – the buffering and pre-fetch techniques of Windows would guess we needed the next block of data before we ask for it, so the disk would be seeking while we’re encrypting, on each thread. At least, I strongly suspect it would – and I would profile the thing instead of just claiming that we’ve managed to avoid an I/O bottleneck by loading files in their entirety up-front. No mention is made of the fact that as soon as a bunch of big files are queued for encryption, you’ll have a bunch of threads all trying to load everything before they bother starting to do anything. Avoiding I/O contention is a tricky topic, and it deserves better than a couple of misleading paragraphs with no attempt at explaining what the benefits of streaming the data would be.
- The thread pool is used to queue threads with work to do. If there are already lots of threads busy, the new threads will wait until the old ones have finished. Note the use of “threads” here – not tasks to run on a pool of existing threads, but threads. This would make the thread pool pointless – what is never explained in the book is that creating threads is a relatively expensive business, and you don’t want to do it repeatedly for short-lived tasks when you could instead create a pool of threads and reuse them to run several tasks. Once this purpose is clear, the notion of queuing threads becomes obviously wrong.
- We can pass some state into the delegate used for work item queuing (or a ParameterizedThreadStart) and use that to give us some context. We need to cast that state to the relevant type before we can use it, because it’s just typed as System.Object. So far so good – except most of the time, Hillar ends up passing into the work item the same reference which would be available as just “this” within the method itself. So we have code such as:
loPiece.poThread = new Thread(new ParameterizedThreadStart(loPiece.ThreadMethod));
- The ThreadMethod method then duly casts its parameter to its own type and uses it. All of this is pointless, as the method doesn’t need any parameters – it can just use “this” inside the method.
- It’s very important to initialize lists with the right capacity. Again, this isn’t too bad as far as it goes – except that this micro-optimisation goes awry when he reads the TextBox.Lines property twice: once to work out the appropriate capacity and once to fill the list with initial data. Unfortunately the TextBox.Lines property has to take the existing text in the TextBox, split it (creating a bunch of substrings) and get the result into an array. This in turn means doing all the normal shenanigans associated with creating buffers which are bigger than you need, filling them, copying to a new buffer etc – exactly what we’re trying to avoid! This “optimization” will usually cost time instead of saving it. It could be easily fixed by just fetching the array in one statement, then using the same array for both the count and the list population. In fact, if you just pass the array into the List<T> constructor, it will perform the optimization for you – it can detect that it’s an ICollection<T> and use the Count property directly. Writing the simplest code actually ends up being optimal.
- The above bullet point isn’t going to dominate the performance of that example though – there’s a potentially far worse effect due to the way the resulting “encrypted” string is broken up each time: using string concatenation in a loop. I guess we’d better hope there are no really long lines. If an author is going to give optimisation “tips” they need to be a lot more rigorous than this. Using string concatenation in a loop is probably the single best-known performance no-no in .NET. I was really shocked to see this in a book which is supposedly about making your code perform better. Now, it could be that string concatenation was used deliberately to slow things down – but in that case, why not highlight it? Drawing attention to intended optimizations gives the impression that the rest of the code is either optimized or has at least been written reasonably. If “bad” code is to be used for a specific purpose, that should be called out so that the reader won’t go onto use the same kind of code in their own production apps (which really shouldn’t be deliberately slowed down).
These aren’t the only issues I have with the code. Unicode is abused by “encrypting” text with no discussion of whether the strings he produces are valid or not (as opposed to the normal practice of only encrypting data after first converting it into binary; the encrypted binary might then be converted to text using base64 if you need to transmit the encrypted data as text). We could easily end up with strings containing surrogate high or low code points without the corresponding half in the appropriate place. When analyzing a bitmap he uses GetPixel and SetPixel for each pixel, rather than calling LockBits once and then accessing the image data in a much faster manner. (The code given does scale, but it’s not as fast as it could be. Using LockBits it would still scale, but the “per thread” work would be faster.) There are other, similar issues lurking in the text, but I’m sure you get the gist of the problem.
Believe it or not, there are things about this book that I actually like. It’s relatively thin, which has very tangible advantages when you’re carrying it around a lot. The sections explaining has to use Process Explorer and Task Manager to their best are useful, and the ideas of the examples are good – even though they basically cover the same ground several times. Unfortunately the bad points outweight the good far too heavily. To summarise them:
- What I consider some of the absolute core elements of .NET multithreading (locking and monitors in particular) aren’t covered at all
- Only the simple situation of an embarrassingly parallel algorithm is covered. In the real world developers will have to face real challenges where tasks don’t always split themselves up nicely into totally independent chunks. A reader who finishes this book assured that they are now threading gurus will face a nasty shock.
- Server-side threading isn’t given much coverage at all, despite this being arguably the most likely environment for developers to encounter multithreading
- The “story” element of the prose style is childish and patronising
- The coding style, while a personal choice, makes me wince – and is particularly verbose for a book, where space is important
- Many bad practices are encouraged, and there are plenty of important misunderstandings to trip up readers
- The index has failed me (even when I’ve known that the subject is in the book) more times than it’s helped me
It’s a real pity. I was hoping this would be a book I could recommend to people as a precursor to reading Joe Duffy’s excellent Concurrent Programming on Windows. Instead, my current best advice is to read Joe Albahari’s threading tutorial. (I previously had a link to my own threading tutorial as well, but apparently this made people think I was fishing for more readers of that.) I’m sure there are good introductory threading books out there, but I’m afraid this isn’t one of them.