Optimization and generics, part 1: the new() constraint (updated: now with CLR v2 results)

As with almost any performance work, your mileage may vary (in particular the 64-bit JIT may work differently) and you almost certainly shouldn’t care. Relatively few people write production code which is worth micro-optimizing. Please don’t take this post as an invitation to make code more complicated for the sake of irrelevant and possibly mythical performance changes.

I’ve been doing quite a bit of work on Noda Time recently – and have started getting my head round all the work that James Keesey has put into the parsing/formatting. I’ve been reworking it so that we can do everything without throwing any exceptions, and also to work on the idea of parsing a pattern once and building a sequence of actions for both formatting and parsing from the action. To format or parse a value, we then just need to apply the actions in turn. Simples.

Given that this is all in the name of performance (and I consider Noda Time to be worth optimizing pretty hard) I was pretty cross when I ran a complete revamp through the little benchmarking tool we use, and found that my rework had made everything much slower. Even parsing a value after parsing the pattern was slower than parsing both the value and the pattern together. Something was clearly very wrong.

In fact, it turns out that at least two things were very wrong. The first (the subject of this post) was easy to fix and actually made the code a little more flexible. The second (the subject of the next post, which may be tomorrow) is going to be harder to work around.

The new() constraint

In my SteppedPattern type, I have a generic type parameter – TBucket. It’s already constrained in terms of another type parameter, but that’s irrelevant as far as I’m aware. (After today though, I’m taking very little for granted…) The important thing is that before I try to parse a value, I want to create a new bucket. The idea is that bits of information end up in the bucket as they’re being parsed, and at the very end we put everything together. So each parse operation requires a new bucket. How can we create one in a nice generic way?

Well, we can just call its public parameterless constructor. I don’t mind the types involved having such a constructor, so all we need to do is add the new() constraint, and then we can call new TBucket():

// Somewhat simplified…
internal sealed class SteppedPattern<TResult, TBucket> : IParsePattern<TResult>
    where TBucket : new()
{
    public ParseResult<TResult> Parse(string value)
    {
        TBucket bucket = new TBucket();

        // Rest of parsing goes here
    }
}

Great! Nice and simple. Unfortunately, it turned out that that one line of code was taking 75% of the time to parse a value. Just creating an empty bucket – pretty much the simplest bit of parsing. I was amazed when I discovered that.

Fixing it with a provider

The fix is reasonably easy. We just need to tell the type how to create an instance, and we can do that with a delegate:

// Somewhat simplified…
internal sealed class SteppedPattern<TResult, TBucket> : IParsePattern<TResult>
{
    private readonly Func<TBucket> bucketProvider;

    internal SteppedPattern(Func<TBucket> bucketProvider)
    {
        this.bucketProvider = bucketProvider;
    }

    public ParseResult<TResult> Parse(string value)
    {
        TBucket bucket = bucketProvider();

        // Rest of parsing goes here
    }
}

Now I can just call new SteppedPattern(() => new OffsetBucket()) or whatever. This also means I can keep the constructor internal, not that I care much. I could even reuse old parse buckets if that wouldn’t be a semantic problem – in other cases it could be useful. Hooray for lambda expressions – until we get to the next post, anyway.

Show me the figures!

You don’t want to have to run Noda Time’s benchmarks to see the results for yourself, so I wrote a small benchmark to time just the construction of a generic type. As a measure of how insignificant this would be for most apps, these figures are in milliseconds, performing 100 million iterations of the action in question. Unless you’re going to do this in performance-critical code, you just shouldn’t care.

Anyway, the benchmark has four custom types: two classes, and two structs – a small and a large version of each. The small version has a single int field; the large version has eight long fields. For each type, I benchmarked both approaches to initialization.

The results on two machines (32-bit and 64-bit) are below, for both the v2 CLR and v4. The 64-bit machine is much faster in general – you should only compare results within one machine, as it were.)

CLR v4: 32-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 689 1225
Large struct 11188 7273
Small class 16307 1690
Large class 17471 3017

CLR v4: 64-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 473 868
Large struct 2670 2396
Small class 8366 1189
Large class 8805 1529

CLR v2: 32-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 703 1246
Large struct 11411 7392
Small class 143967 1791
Large class 143107 2581

CLR v2: 64-bit results (ms per 100 million iterations)

Test type new() constraint Provider delegate
Small struct 510 686
Large struct 2334 1731
Small class 81801 1539
Large class 83293 1896

(An earlier version of this post had a mistake – my original tests used structs for everything, despite the names.)

Others have reported slightly different results, including the new() constraint being better for both large and small structs.

Just in case you hadn’t spotted them, look at the results for classes. Those are the real results – it took over 2 minutes to run the test using the new() constraint on my 32-bit laptop, compared with under two seconds for the provider. Yikes. This was actually the situation I was in for Noda Time, which is built on .NET 2.0 – it’s not surprising that so much of my benchmark’s time was spent constructing classes, given results like this.

Of course you can download the benchmark program for yourself and see how it performs on your machine. It’s a pretty cheap-and-cheerful benchmark, but when the differences are this big, minor sources of inaccuracy don’t bother me too much. The simplest way to run under CLR v2 is to compile with the .NET 3.5 C# compiler to start with.

What’s going on under the hood?

As far as I’m aware, there’s no IL to give support for the new() constraint. Instead, the compiler emits a call to Activator.CreateInstance<T>. Apparently, that’s slower than calling a delegate – presumably due to trying to find an accessible constructor with reflection, and invoking it. I suspect it could be optimized relatively easily – e.g. by caching the results per type it’s called with, in terms of delegates. I’m slightly surprised this hasn’t (apparently) been optimized, given how easy it is to cache values by generic type. No doubt there’s a good reason lurking there somewhere, even if it’s only the memory taken up by the cache.

Either way, it’s easy to work around in general.

Conclusion

I wouldn’t have found this gotcha if I didn’t have before and after tests (or in this case, side-by-side tests of the old way and the new way of parsing). The real lesson of this post shouldn’t be about the new() constraint – it should be how important it is to test performance (assuming you care), and how easy it is to assume certain operations are cheap.

Next post: something much weirder.

13 thoughts on “Optimization and generics, part 1: the new() constraint (updated: now with CLR v2 results)”

  1. Having known for a long time that Activator.CreateInstance is slow on perf critical places, I was actually surprised that the generic new() constraint was not optimised. As it follows the same syntax of the direct constructor call (which is very fast), my gut feeling was that it would be almost as fast, just to eliminate the element of surprise.

    ah well.

    As you said, generating a dynamic method from the object’s constructor is rather easy with the CLR 2+, and the pass-delegate way is even simpler (though changes client’s code), so at least one have a good fallback. å

  2. “there’s no IL to give support
    for the new() constraint. Instead, the compiler
    emits a call to Activator.CreateInstance < T>” – wow, that’s something I didn’t know and, like you, I’m surprised it hasn’t been optimised. Given that reflection is not used for things like calling an instance method when type constraint is present, I would never have guessed that reflection is used when calling the parameterless constructor when a new() constraint is present.

    I’d be interested to hear what someone like Eric Lippert has to say about this. I wonder if there’s a specific difficulty involved as I’d expect that by now, given that generics have featured in two releases of C# and the CLR, this would be optimised by now and the need for reflection removed.

  3. I’m confused. If you are aware the type is a struct, so therefore there can be no actual default constructor (that’s forbidden). Therefore simply doing default(T) is sufficient (this is precisely what the activator will end up doing, it’s just going to hunt around quite a bit before it finds out that’s what it needs to do.

    IIRC simply placing in the following code in a static method should be fine:

    If (default(T) == null)
    return new T();
    else
    return default(T);

    Disclaimer, I am writing this from my iPad so I can’t check it is high performant but I believe the JIT will actually Elide the branch in these circumstances making it a JIT time constant dead code elimination.

  4. @ShuggyCoUk: Yes, for a struct that would work – and is an optimization the compiler could presumably perform, *if* it was happy to assume no parameterless constructors for structs. (You can actually define them in IL – I don’t know offhand whether they’re used by “new T()” expressions – I *suspect* they aren’t when there’s a “where T : struct” constraint, but I haven’t checked.

    But even with that optimization, you’re still calling the constructor for a class via Activator.CreateInstance…

  5. Hi Jon,

    Belive me, I’ve been there too. In general, usage of Reflection can be heavily optimized by creating delegates for your invocations, and use those instead of a reflection invocation.

    Another strategy you can employ when running on .Net 3.5 and later is to build a NewExpression, compile it and cache the delegate. This alone offers you a huge room for runtime performance optimizations, without changing your public interface (the Func wouldnt be required, I would leave the new() constraint in place).

  6. Either I’m missing something here or you’ve made a copy/paste error in the code.

    In both your before and after code snippets the class contains the new() constraint. The only difference is how the type is created. I have two issues with this.

    1. If you have a delegate for creating TBucket why do you need the constraint? Shirley you could construct even classes that don’t have default constructors.
    2. You say the problem is in the constraint but it appears that the culprit is the object creation not the constraint.

    Of course in any case this wouldn’t happen in C++ where templates are compile time entities…

  7. Sure, if you’re using classes as well and the allocation and if pressure don’t swamp it then go with the dynamic plus delegate caching Johannes suggested on that code path.

    If the flexibility of the explicit delegate is helpful in and of itself that’s cool though, and is certainly easier to debug.

    It is definitely worth pointing out the extreme difference in cost between that constraint and all the others.

  8. Few years (!) ago I came across MSDN help for “HtmlDocument.Images Property”

    In C# samples, they use:
    string[] urls = (string[])Array.CreateInstance(Type.GetType(“System.String”), doc.Images.Count);

    Nice, just nice.

    Just checked, it’s still there.

  9. If you need 3.5 you can still construct the creation expression, you just have to do it yourself rather than letting dynamic deal with it all for you.

    I did almost the same thing and then a helpful SO user pointed out I could do it almost as well in 3.5 if I used the Expression AST for simple cases (in my case conversions and casts)

  10. I find myself defining more and more high-level functions in C#. It seems the more functional tools the language provides, the more functional tools we want.

Comments are closed.