How can I enumerate thee? Let me count the ways…

This weekend, I was writing some demo code for the async chapter of C# in Depth – the idea was to decompile a simple asynchronous method and see what happened. I received quite a surprise during this, in a way which had nothing to do with asynchrony.

Given that at execution time, text refers to an instance of System.String, and assuming nothing in the body of the loop captures the ch variable, how would you expect the following loop to be compiled?

foreach (char ch in text)
{
    // Body here
}

Before today, I could think of four answers depending on the compile-time type of text, assuming it compiled at all. One of those answers is if text is declared to be dynamic, which I’m not going to go into here. Let’s stick with static typing.

If text is declared as IEnumerable

In this case, the compiler can only use the non-generic IEnumerator interface, and I’d expect the code to be roughly equivalent to this:

IEnumerator iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = (char) iterator.Current;
        // Body here
    }
}
finally
{
    IDisposable disposable = iterator as IDisposable;
    if (disposable != null)
    { 
        disposable.Dispose();
    }
}

Note how the disposal of the iterator has to be conditional, as IEnumerator doesn’t extend IDisposable.

If text is declared as IEnumerable<char>

Here, we don’t need any execution time casting, and the disposal can be unconditional:

IEnumerator<char> iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = iterator.Current;
        // Body here
    }
}
finally
{
    iterator.Dispose();
}

If text is declared as string

Now things get interesting. System.String implements IEnumerable<char> using explicit interface implementation, and exposes a separate public GetEnumerator() method which is declared to return a CharEnumerator.

Usually when I find a type doing this sort of thing, it’s for the sake of efficiency, to reduce heap allocations. For example, List<T>.GetEnumerator returns a List<T>.Enumerator which is struct with the appropriate iteration members. This means if you use foreach over an expression of type List<T>, the iterator can stay on the stack in most cases, saving object allocation and garbage collection.

In this case, however, I suspect CharEnumerator was introduced (way back in .NET 1.0) to avoid having to box each character in the string. This was one reason for foreach handling to be based on types obeying the enumerable pattern, as well as there being support through the normal interfaces. It strikes me that it could still have been a structure in the same way as for List<T>, but maybe that wasn’t considered as an option.

Anyway, it means that I would have expected the code to be compiled like this, even back to C# 1:

CharEnumerator iterator = text.GetEnumerator();
try
{
    while (iterator.MoveNext())
    {
        char ch = iterator.Current;
        // Body here
    }
}
finally
{
    iterator.Dispose();
}

What really happens when text is declared as string

(This is the bit that surprised me.)

So far, I’ve been assuming that the C# compiler doesn’t have any special knowledge about strings, when it comes to iteration. I knew it did for arrays, but that’s all. The actual result – under the C# 5 compiler, at least – is to use the Length property and the indexer directly:

int index = 0;
while (index < text.Length)
{
    char ch = text[index];
    index++;
    // Body here
}

There’s no heap allocation, and no Dispose call. If the variable in question can change its value within the loop (e.g. if it’s a field, or a captured variable, or there’s an assignment to it within the body) then a copy is made of the variable value (just a reference, of course) first, so that all member access is performed on the same object.

Conclusion

So, there we go. There’s nothing particularly mind-blowing here – certainly nothing to affect your programming style, unless you were deliberately avoiding using foreach on strings "because it’s slow." It’s still a good lesson in not assuming you know what the compiler is going to do though… so long as the results are as expected, I’m very happy for them to put extra smarts in, even if it does mean having to change my C# in Depth sample code a bit…

19 thoughts on “How can I enumerate thee? Let me count the ways…”

  1. Interesting read. Your blog, SO answers and videos on Tekpub in particular are extremely inspirational for a young developer. It makes me think about my code in ways I haven’t done before, and it makes me curious to see what the compiler does to my code.

  2. Interesting. Of course, in a variety of ways I think System.String has a special relationship with the compiler. Another example is interning of literals. I never had looked at the actual compiler output before, but it doesn’t surprise me to find System.String special-cased in this way.

    I’ll bet there are other ways it’s handled in special ways by the compiler too.

    By the way, I think in your first example, you meant to write this instead:

    IDisposable disposable = iterator as IDisposable;
    if (disposable != null)
    {
    disposable.Dispose();
    }

  3. @pete: Thanks, will fix the disposable typo.

    @tobi: The array case isn’t applicable here – I explicitly stated that “text” refers to an instance of System.String here. I didn’t want to get into the details of the array case here, as it wasn’t relevant to the specific situation I was looking at.

  4. This should be an obvious comment: The reason they can use an indexer directly and not an Enumerator is that strings are immutable – there is no risk of the string changing, and no exception will be thrown.
    If there is ever real support for immutable collections (as opposed to read-only collections), we would probably see a lot more of similar optimizations.

  5. In the third example, I assume that you meant to say:
    CharEnumerator iterator = text.GetEnumerator();
    instead of:
    IEnumerator iterator = text.GetEnumerator();

  6. Some text e.g ‘text’ and ‘System.String’ in the following quoted from the top of your page is about 1/4 the size of the rest in firefox, Unreadable.

    “Given that at execution time, text refers to an instance of System.String, and… “

  7. @Graeme: Ages ago I tried fighting against the CSS to get the inline monospaced font to work here. I’m afraid I never managed – and I gave up trying, after spending ages tearing my hair out :(

  8. @skeet: Try changing the “font-size:81%” to “font-size:13px” for the body element. (style.css, line 16)

  9. @Richard: Unfortunately I don’t get to control that style sheet :(

    I’m beginning to think I should start hosting my blog myself…

  10. @Dan: No, the compiler gets to see that it’s a string. First port of call would normally be to see whether it’s got a GetEnumerator() method following the right pattern, i.e. the CharEnumerator version.

    It’s the Length/indexer version which is the surprise…

  11. Why do you always use in your examples:

    try
    {
    }
    finally
    {
    variable.Dispose();
    }

    Is it your impression how compile should work or how developer should write the code?

  12. If the JIT had done better in-lining in V1 and could detect when a heap allocation can be converted into a stack allocation, these special cases may never have been needed.

Comments are closed.