Array covariance: not just ugly, but slow too

It seems to be quite a long time since I’ve written a genuine "code" blog post. Time to fix that.

This material may well be covered elsewhere – it’s certainly not terrifically original, and I’ve been meaning to post about it for a long time. In particular, I remember mentioning it at CodeMash in 2012. Anyway, the time has now come.

Refresher on array covariance

Just as a bit of background before we delve into the performance aspect, let me remind you what array covariance is, and when it applies. The basic idea is that C# allows a reference conversion from type TDerived[] to type TBase[], so long as:

  • TDerived and TBase are both reference types (potentially interfaces)
  • There’s a reference conversion from TDerived to TBase (so either TDerived is the same as TBase, or a subclass, or an implementing class etc)

Just to remind you about reference conversions, those are conversions from one reference type to another, where the result (on success) is never a reference to a different object. To quote section 6.1.6 of the C# 5 spec:

Reference conversions, implicit or explicit, never change the referential identity of the object being converted. In other words, while a reference conversion may change the type of the reference, it never changes the type or value of the object being referred to.

So as a simple example, there’s a reference conversion from string to object, therefore there’s a reference conversion from string[] to object[]:

string[] strings = new string[10];
object[] objects = strings;

// strings and objects now refer to the same object

There is not a reference conversion between value type arrays, so you can’t use the same code to conver from int[] to object[].

The nasty part is that every store operation into a reference type array now has to be checked at execution time for type safety. So to extend our sample code very slightly:

string[] strings = new string[10]; 
object[] objects = strings;

objects[0] = "string"; // This is fine
objects[0] = new Button(); // This will fail

The last line here will fail with an ArrayTypeMismatchException, to avoid storing a Button reference in a String[] object. When I said that every store operation has to be checked, that’s a slight exaggeration: in theory, if the compile-time type is an array with an element type which is a sealed class, the check can be avoided as it can’t fail.

Avoiding array covariance

I would rather arrays weren’t covariant in the first place, but there’s not a lot that can be done about that now. However, we can work around this, if we really need to. We know that value type arrays are not covariant… so how about we use a value type array instead, even if we want to store reference types?

All we need is a value type which can store the reference type we need – which is dead easy with a wrapper type:

public struct Wrapper<T> where T : class
{
    private readonly T value;
    public T Value { get { return value; } }
    
    public Wrapper(T value)
    {
        this.value = value;
    }
    
    public static implicit operator Wrapper<T>(T value)
    {
        return new Wrapper<T>(value);
    }
}

Now if we have a Wrapper<string>[], we can’t assign that to a Wrapper<object>[] variable – the types are incompatible. If that feels a bit clunky, we can put the array into its own type:

public sealed class InvariantArray<T> where T : class
{
    private readonly Wrapper<T>[] array;
    
    public InvariantArray(int size)
    {
        array = new Wrapper<T>[size];
    }
    
    public T this[int index]
    {
        get { return array[index].Value; }
        set { array[index] = value; }
    }
}

Just to clarify, we now only have value type arrays, but ones where each value is a plain wrapper for a reference. We now can’t accidentally violate type-safety at compile-time, and the CLR doesn’t need to validate write operations.

There’s no memory overhead here – aside from the type information at the start, I’d actually expect the contents of a Wrapper<T>[] to be indistinguishable from a T[] in memory.

Benchmarking

So, how does it perform? I’ve written a small console app to test it. You can download the full code, but the gist of it is that we use a stopwatch to measure how long it takes to either repeatedly write to an array, or repeatedly read from an array (validating that the value read is non-null, just to prove that we’ve really read something). I’m hoping I haven’t fallen foul of any of the various mistakes in benchmarking which are so easy to make.

The test tries four scenarios:

  • object[] (but still storing strings)
  • string[]
  • Wrapper<string>[]
  • InvariantArray<string>

Running against an array size of 100, with 100 million iterations per test, I get the following results on my Thinkpad Twist :

Array type Read time (ms) Write time
object[] 11842 44827
string[] 12000 40865
Wrapper<string>[] 11843 29338
InvariantArray<string> 11825 32973

That’s just one run, but the results are fairly consistent across runs. The one interesting deviation is the write time for object[] – I’ve observed it sometimes being the same as for string[], but not consistently. I don’t understand this, but it does seem that the JIT isn’t performing the optimization for string[] that it could if it spotted that string is sealed.

Both of the workarounds to avoid array covariance make a noticeable difference to the performance of writing to the array, without affecting read performance. Hooray!

Conclusion

I think it would be a very rare application which noticed a significant performance boost here, but I do like the fact that this is one of those situations where a cleaner design also leads to better performance, without many obvious technical downsides.

That said, I doubt that I’ll actually be using this in real code any time soon – the fact that it’s just "different" to normal C# code is a big downside in itself. Hope you found it interesting though :)

24 thoughts on “Array covariance: not just ugly, but slow too”

  1. That’s one extremely neat hack. I’m amazed that covariance checking slows down element writes by a third even if you don’t actually use covariance.

    Java has the same kind of array covariance. I wonder how well the JVM deals with it? Java doesn’t support this kind of value type wrapper but perhaps one could measure against reference-sized integer arrays…

  2. A great article as always. The key feature I miss from C# is the support of covariance for return types. It’s a feature I’ve been missing for years, but it seems Eric Lippert (from the C# team) thinks it’s unlikely to ever appear in C#.

    http://stackoverflow.com/questions/5709034/does-c-sharp-support-return-type-covariance

    The use of the “new” operator to fake return type covariance is a hack; a more elegant design is required.

    Personally, I’m hoping the C# team realizes they’ve made a mistake and decide to implement it :)

  3. Thanks for this.

    I do think it’s telling that _read_ access is unaffected. So covariance is “slow” only in the write case. I suspect that if anyone ever actually has a need for array covariance, the read scenario is probably much more common than writing.

    And really, it seems from the benchmark that it’s not covariance for writing that’s so slow, but rather the type checking that occurs regardless. I.e. the string[] scenario is almost as slow as the object[] scenario.

    As you say, this is mostly an academic question. I’m certainly not going to change my code over to array wrappers, and the real-world effect on performance is likely negligible in 99.94% of the cases. For that matter, the use of array covariance is probably infrequent. I know my own programs don’t use it. Ever.

    But it’s definitely interesting. Thanks for sharing!

  4. @pete.d: Yes, read access is unaffected because there’s nothing to check here. I would say that for writing it’s the covariance which requires a check in *some* cases that makes it slow – but that check is being performed regardless of whether it’s required.

    Apparently the Roslyn team *do* use this, because in their use case it’s actually significant – but again, I agree that for most people it’s irrelevant.

  5. Right, I understand why it’s unaffected. I’m just saying that I suspect that’s the common case.

    By the way, I ran the code on my own computer (Core i5-2430), and I have what I think are some marginally interesting observations.

    First and most important, I was unable to reproduce any significant difference between write performance for the object[] and string[] scenarios. I strongly suspect something about your configuration is confounding the results. Maybe the CPU has a lazier turbo-boost setting, causing the program to run slowly at first. What happens if you run the string[] and object[] scenarios in reverse order?

    Second, I noted an even more significant difference in performance between the regular cases and the wrapped cases. While your results showed up to a 40% increase in time, I got up to a 65% increase (I’m rounding to the nearest 5%…there’s enough variation in the timings that the numbers aren’t any more precise than that).

    Third, I tested both x86 and x64, and that produced some (academically) interesting differences as well. While the covariant/invariant scenarios had the same relative performance (around 65% difference at worst), the wrapper/array difference was only 10% for x86 but 25% for x64.

    Less surprising, though still interesting, is that the x86 version was significantly slower for writes than the x64, with across-the-board slow-downs of around 25% (except for the InvariantArray scenario, where the x86 version, closer in performance to the x86 WrapperArray, of course also was closer to the x64 version of itself).

    Clearly, in a situation where 25% to 65% differences in performance are a big deal (they often are not, even when the user actually experiences a full 65% slowdown), and where the processing involves a significant time spent writing to arrays, this is actually a useful piece of knowledge to keep in one’s optimizing toolbox.

    Thanks!

  6. @pete.d: As I noted in the post, I saw varying results around object[] vs string[], but varying the location didn’t produce much consistency. There may well have been all kinds of causes behind that, including the turbo-boost you mentioned. (I did various experiments to try to get consistency, but gave up after a while.)

    I did try on x64 as well (and meant to include that in the post) but didn’t see noticeable differences between those results and the x86 ones. All very mysterious :)

  7. One more thing…

    In regards to “No, List[T] is invariant”, not so fast there cowboy. :)

    List[T] has an invariant interface. But it uses an array as the underlying storage. And that array has the same limitations and performance effects that arrays do generally.

    Indeed, when I modify your program to add two more scenarios, a List[string] and a List[Wrapper[string]], I see the same basic performance effects. They are less significant, because List[T] has significant overhead in and of itself (even the read scenario, which as we know is unaffected with regards to the array, is 3x slower for a List[T] than for an array).

    But even with the masking effects of List[T]‘s overall performance overhead, the List[T] write scenario is consistently 10% slower than the List[Wrapper[T]] scenario, just as one might expect.

    I suppose in theory, the JIT compiler and runtime could take advantage of List[T]‘s invariant nature and special-case that, so as to avoid type-compatibility checks on writes to the backing array, but from the results, it seems clear that they don’t. So List[T] suffers along with reference-type arrays generally.

    Interestingly, I suppose that List[T]‘s implementation itself could use a Wrapper[T], just like your InvariantArray[T] type, and enjoy the same performance gains. But apparently it doesn’t. For that matter, I find it intriguing and disappointing that even List[T]‘s basic read scenario is so much slower than your InvariantArray[T]‘s ready, since I would expect the implementation to be essentially the same for that! What’s going on that makes List[T]‘s indexer so much slower than InvariantArray[T]‘s, I wonder?

  8. “As I noted in the post, I saw varying results around object[] vs string[], but varying the location didn’t produce much consistency”

    Sorry if I missed that. Though, even with you pointing it out, I still can’t find where you mentioned changing the order of the object[] and string[] tests. :(

  9. @pete.d: Yes, good point about list invariance and the underlying array. And yes, I agree with your point about the potential for it to use a wrapper implementation. Note that the list can’t delegate directly to the array as my InvariantArray does, as it needs to enforce constraints based on Count rather than just relying on the array itself.

    For the last part – I didn’t mention changing the order of the tests or the various things that I attempted in order to get consistency, as I didn’t want to distract from the main focus of the post.

  10. It would make sense given the runtime cost to make this something that could be disabled via a compiler switch.

  11. @BlindWanderer: No, I don’t think so. Consider arrays which pass between assembly boundaries – if a method in one assembly hands an object[] reference to another assembly, one of those assemblies could be compiled in “no variance” mode and the other compiled as normal. Much as I think it was a design mistake, I think adding compiler switches would confuse things more than help.

  12. Launched your benchmark against mono (W8 x64, Mono JIT compiler version 3.0.3 (master/39c48d5), C2D T7300). Got very controversial result.

    > mono .\ArrayPerformance.original.exe 10000000 100
    WriteObjectArray: 6216ms
    ReadObjectArray: 1657ms
    WriteStringArray: 9488ms
    ReadStringArray: 1698ms
    WriteWrapperArray: 16341ms
    ReadWrapperArray: 1672ms
    WriteInvariantArray: 32572ms
    ReadInvariantArray: 2695ms

  13. “Note that the list can’t delegate directly to the array as my InvariantArray does”

    Yup, that’s true. Still, a 3x slowdown?

    Seems excessive. I’m sure there’s a good reason, but it’s beyond my intuition (and I’m too lazy to investigate :) ).

  14. Generics with a reference type parameter includes its own overhead. You can get an additional performance benefit from constructing a non-generic array and wrapper class. And if you want the convenience of a generic array class, you can also try an wrapper with an “object” value that undergoes a cast for each read (casting for write is not necessary).

  15. Jon, the jitter is doing an optimization here to WriteWrapperArray/WriteInvariantArray. The implicit cast is actually only called once. The jitter knows value is never set again and reuses the value type. That means you are comparing 100mil reference writes with 1 reference write and 100mil value type copies. If you make value random(For both WriteObjectArray/WriteWrapperArray) the times are more similar. Also there seems to be little difference between the string/object asm which is most likely just noise due to the jitter.

  16. @Will: Interesting, thanks. Will have to try some more tests for other situations with side-effects in the conversion (e.g. incrementing a counter).

  17. Never mind that first message. I was reading the assembly wrong(windbg’s disassembly window is very painful, instead using VS with jit optimizations on showed that the implicit cast/constructor is just being inlined).

    After looking at this a bit more. It seems the timing difference is due to stelem_ref never being inlined(even when using a sealed class). That along with stelem_ref having an extra type/null check(See: JIT_Stelem_Ref_Portable jithelpers.cpp in sscli20). Wrapping with a structure allows for inlining and eliminates the type/null check inside jit_stelem_ref.

    Also object[] arrays ignore the type hierarchy check. To see the real cost of a covariant array you are gonna want to use another array type. A quick timing shows covariant arrays to be 3.5x slower.

    E.g.

    class Animal { }
    class Cat : Animal { }

    static void WriteAnimalArray(int iterations, int size, Stopwatch watch)
    {
    Cat value = new Cat();
    Animal[] array = new Animal[size];

    watch.Start();
    for (int i = 0; i < iterations; i++)
    {
    for (int j = 0; j < size; j++)
    {
    array[j] = value;
    }
    }
    }

  18. @Will: I’m not sure what you mean by “ignore the type hierarchy check” – there still has to be a check, as the object[] reference could actually be to a string[].

    However, I *do* see differences in performance between object/string and Animal/Cat, so more investigation is definitely required. The use of the wrapper type still improves performance though, even in that case.

  19. @Jon: In the code you aren’t doing the covariance like you described at the start of the article. See: WriteStringArray, shouldn’t that be “object[] array = new string[size];”? Otherwise because the element type matches the array type the stelem_ref method skips traversing the type hierarchy.

    The type hierarchy traversing is the most expensive part of stelem_ref. The traversing must be done if the array is not of type object or if the element type doesn’t match the array’s type.

    You can see what I mean by looking at the JIT_stelem_ref_portable code in the SSCLI20. I’m not sure if the licensing allows me to post a small snippet from the SSCLI20, so I won’t. Maybe the micro framework has JIT_stelem and I can post a snippet from there(I’ll check later tonight).

    Yes, using the wrapper allows the jit to inline the reference assignment(eliminating instructions) and avoids calling stelem_ref.

  20. @Will: The point of the WriteStringArray code was deliberately to see if the JIT would notice that we were dealing with a string[] which *can’t* use covariance (as string is a sealed type).

    It sounds like there are plenty of other tests I should add though…

Comments are closed.