Benchmarking made easy

While I was answering a Stack Overflow question on the performance implications of using a for loop instead of a foreach loop (or vice versa) I promised to blog about the results – particularly as I was getting different results to some other posters.

On Saturday I started writing the bigger benchmark (which I will post about in the fullness of time) and used a technique that I’d used when answering a different question: have a single timing method and pass it a delegate, expected input and expected output. You can ask a delegate for the associated method and thus find out its name (for normal methods, anyway – anonymous functions won’t give you anything useful, of course) so that’s all the information you really need to run the test.

I’ve often shied away from using delegates for benchmarking on the grounds of it interfering with the results – including the code inline with the iteration and timing obviously has a bit less overhead. However, the CLR is so fast at delegate invocation these days that it’s really not an issue for benchmarks where each iteration does any real work at all.

It’s still a pain to have to write that testing infrastructure each time, however. A very long time ago I wrote a small attribute-based framework. It worked well enough, but I’ve found myself ignoring it – I’ve barely used it despite writing many, many benchmarks (mostly for newsgroup, blog and Stack Overflow posts) over the course of the years. I’m hoping that the new framework will prove more practical.

There are a few core concepts and (as always) a few assumptions:

  • A benchmark test is a function which takes a single value and returns a single value. This is expressed generically, of course, so you can make the input and output whatever type you like. A test also has a descriptive name, although this can often be inferred as the name of the function itself. The function will be called many times, the exact number being automatically determined.
  • A test suite is a collection of benchmark tests and another descriptive name, as well as the input to supply to each test and the expected output.
  • A benchmark result has a duration and an iteration count, as well as the descriptive name of the test which was run to produce the result. Results can be scaled so that either the duration or the iteration count matches another result. Likewise a result has a score, which is simply the duration (in ticks, but it’s pretty arbitrary) divided by the iteration count. Again, the score can be retrieved in a scaled fashion, using a specified result as a “standard” with a scaled score of 1.0. Lower is always better.
  • A result suite is simply the result of running a test suite. A result suite can be scaled, which is equivalent to building a new result suite with scaled copies of each original result. The result suite contains the smarts to display the results.
  • Running a test consists of two phases. First, we guess roughly how fast the function is. We run 1 iteration, then 2, then 4, then 8 etc – until it takes at least 3 seconds. At that point we scale up the number of iterations so that the real test will last around 30 seconds. This is the one we record. The final iteration of each set is tested for correctness based on the expected output. Currently the 3 and 30 second targets are hard-coded; I could perhaps make them parameters somewhere, but I don’t want to overcomplicate things.

Now for the interesting bit (from my point of view, anyway): I decided that this would be a perfect situation to try playing with a functional style. As a result, everything in the framework is immutable. When you “add” a test to a test suite, it returns a new test suite with the extra test. Running the test suite returns the result suite; scaling the result suite returns a new result suite; scaling a result returns a new result etc.

The one downside of this (beyond a bit of inefficiency in list copying) is that C# collection initializers only work with mutable collections. They also only work with direct constructor calls, whereas generic type inference doesn’t apply to constructors. In the end, the “static generic factory method” combined with simple Add method calls yields quite nice results, even though I can’t use a collection initializer:

double[] array = …; // Code to generate random array of doubles

var results = TestSuite.Create(“Array”, array, array.Sum())
                       .Add(ArrayFor)
                       .Add(ArrayForEach)
                       .Add(Enumerable.Sum, “LINQ Enumerable.Sum”)
                       .RunTests()
                       .ScaleByBest(ScalingMode.VaryDuration);

results.Display(ResultColumns.NameAndDuration);

This is a pretty small amount of extra code to write, beyond the code we actually want to benchmark (the ArrayFor and ArrayForEach methods in particular). No looping by iteration count, no guessing at the number of iterations and rerunning it until it lasts a reasonable amount of time, etc.

My only regret is that I haven’t written this in a test-driven way. There are currently no unit tests at all. Such is the way of projects that start off as “let’s just knock something together” and end up being rather bigger than originally intended.

At some point I’ll make it all downloadable from my main C# page, in normal source form, binary form, and also a “single source file” form so you can compile your benchmark with just csc /o+ /debug- Bench*.cs to avoid checking the assembly filename each time you use it. For the moment, here’s a zip of the source code and a short sample program, should you find them useful. Obviously it’s early days – there’s a lot more that I could add. Feedback would help!

Next time (hopefully fairly soon) I’ll post the for/foreach benchmark and results.

4 thoughts on “Benchmarking made easy”

  1. There used to be a unit-based benchmarking tool that worked quite well. It’s called NPerf, and it’s still lingering on codeproject. It would be nice to see a modernized, lightweight version of that incorporated into a unit test runner. The MbUnit equivalent loses the simplicity of the original NPerf.

  2. Hmm… looking at the NPerf page it still looks like you’ve got to work out the size of the test yourself etc. I think my approach will (when I’ve got the framework in a single file to be compiled along with your benchmark) make it simpler to get quick ‘n dirty results for SO posts etc.

    I’m hoping to post the for/foreach results later…

  3. Of course as with all benchmarking you should remember to compile in release mode and run it without a debugger attatched to see how it performs with full optimisations and hence closer to how it will perform in the field.

Comments are closed.