Array covariance: not just ugly, but slow too

It seems to be quite a long time since I’ve written a genuine "code" blog post. Time to fix that. This material may well be covered elsewhere – it’s certainly not terrifically original, and I’ve been meaning to post about it for a long time. In particular, I remember mentioning it at CodeMash in 2012. Anyway, the time has now come. Refresher on array covariance Just as a bit of background before we delve into the performance aspect, let me remind you what array covariance is, and when it applies. The basic idea is that C# allows a reference conversion … Continue reading Array covariance: not just ugly, but slow too

Optimization and generics, part 2: lambda expressions and reference types

As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn’t care. Relatively few people write production code which is worth micro-optimizing. Please don’t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes. It took me a surprisingly long time to find the problem described in the previous blog post, and almost no time at all to fix it. I understood why it was happening. This next problem took a while to identify at all, but … Continue reading Optimization and generics, part 2: lambda expressions and reference types

Optimization and generics, part 1: the new() constraint (updated: now with CLR v2 results)

As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn’t care. Relatively few people write production code which is worth micro-optimizing. Please don’t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes. I’ve been doing quite a bit of work on Noda Time recently – and have started getting my head round all the work that James Keesey has put into the parsing/formatting. I’ve been reworking it so that we can do everything without throwing … Continue reading Optimization and generics, part 1: the new() constraint (updated: now with CLR v2 results)

LINQ To Objects and the performance of nested "Where" calls

This post came out of this Stack Overflow question, which essentially boils down to which is better out of these two options: var oneBigPredicate = collection.Where(x => Condition1(x)                                          && Condition2(x)                                          && Condition3(x)); var multiplePredicates = collection.Where(x => Condition1(x))                                    .Where(x => Condition2(x))                                    .Where(x => Condition3(x)) The first case is logically a single "wrapper" sequence around the original collection, with a filter which checks all three conditions (but applying short-circuiting logic, of course) before the wrapper will yield an item from the original sequence. The second case is logically a set of concentric wrapper sequences, each applying a … Continue reading LINQ To Objects and the performance of nested "Where" calls

There’s a hole in my abstraction, dear Liza, dear Liza

I had an interesting day at work today. I thought my code had broken… but it turns out it was just a strange corner case which made it work very slowly. Usually when something interesting happens in my code it’s quite hard to blog about it, because of all the confidentiality issues involved. In this case, it’s extremely easy to reproduce the oddity in an entirely vanilla manner. All we need is the Java collections API. I have a set – a HashSet, in fact. I want to remove some items from it… and many of the items may well … Continue reading There’s a hole in my abstraction, dear Liza, dear Liza

Revisiting randomness

Almost every Stack Overflow question which includes the words "random" and "repeated" has the same basic answer. It’s one of the most common "gotchas" in .NET, Java, and no doubt other platforms: creating a new random number generator without specifying a seed will depend on the current instant of time. The current time as measured by the computer doesn’t change very often compared with how often you can create and use a random number generator – so code which repeatedly creates a new instance of Random and uses it once will end up showing a lot of repetition. One common … Continue reading Revisiting randomness

Buffering vs streaming benchmark: first results

My poor laptop’s had a busy weekend. It’s run 72 tests, rebooting between each test. Most of these tests have kept both the CPU and disk busy for a lot of the time. I expect to update this blog post with more numbers – and possibly more strategies – as time goes on, but I wanted to get the numbers out as quickly as possible. Graphs should be coming when I’ve got a decent network to experiment with Google graphing. Source code and setup While I don’t really expect anyone else to sacrifice their computer for the best part of … Continue reading Buffering vs streaming benchmark: first results

Benchmarking IO: buffering vs streaming

I mentioned in my recent book review that I was concerned about a recommendation to load all of the data from an input file before processing all of it. This seems to me to be a bad idea in an age where Windows prefetch will anticipate what data you need next, etc – allowing you to process efficiently in a streaming fashion. However, without any benchmarks I’m just guessing. I’d like to set up a benchmark to test this – it’s an interesting problem which I suspect has lots of nuances. This isn’t about trying to prove the book wrong … Continue reading Benchmarking IO: buffering vs streaming

Benchmarking: designing an API with unusual goals

In a couple of recent posts I’ve written about a benchmarking framework and the results it produced for using for vs foreach in loops. I’m pleased with what I’ve done so far, but I don’t think I’ve gone far enough yet. In particular, while it’s good at testing multiple algorithms against a single input, it’s not good at trying several different inputs to demonstrate the complexity vs input size. I wanted to rethink the design at three levels – what the framework would be capable of, how developers would use it, and then the fine-grained level of what the API … Continue reading Benchmarking: designing an API with unusual goals