Gotcha around iterator blocks

This will be a short interlude from Edulinq, although as it will become obvious, it was in the process of writing the next Edulinq post that I made this discovery. It may be known about elsewhere, but it’s the first time I’d come across it – or rather, it’s the first time I’ve considered the impact of the compiler’s decisions in this area.

Sequences and iterators

This post is about details of iterator blocks and memory usage. It’s going to be quite important that you can easily follow exactly what I’m talking about, so I’m going to define a few things up-front.

When I talk about a sequence, I mean an IEnumerable<T>. So a List<T> is a sequence, as is the return value of Enumerable.Range, or many of the other LINQ queries.

When I talk about an iterator, I mean the result of calling GetEnumerator() on a sequence – it’s an IEnumerator<T>. In particular, there can be many iterators corresponding to a single sequence at the same time. A foreach loop will take a sequence and hide the mechanics of obtaining the iterator and calling MoveNext()/Current on it repeatedly.

The C# language specification calls a sequence an enumerable object and it calls an iterator an enumerator object. However, those terms are easy to get mixed up – partly because they differ in so few letters – hence my use of sequence and iterator.

An iterator block is a method declared to return IEnumerable, IEnumerator, IEnumerable<T> or IEnumerator<T>, and containing one or more "yield" statements. The compiler deals with iterator blocks differently to normal methods. It creates an extra type behind the scenes, effectively converting local variables of the iterator block into instance variables in the extra type. The method itself is stripped bare, so that it just creates an object of the new type, without running any of the original source code until MoveNext() is first called on the corresponding iterator (either immediately or after a call to GetEnumerator(), depending on whether the method is declared to return a sequence or an iterator).

Compiler optimizations

The details of exactly how the compiler transforms iterator blocks is outside the scope of the C# specification. It does talk about some details, and gives an implementation example, but there’s plenty of scope for compiler-specific variation.

The Microsoft C# compiler does a "neat trick" which is the source of the gotcha I’m describing today. When it creates a hidden type to represent a sequence, the same type is used to represent the iterator, and the first time that GetEnumerator() is called on the same thread as the one which created the sequence, it returns a reference to "this" rather than creating a new object.

This is easier to follow with an example. Consider this method:

static IEnumerable<int> GetSimpleSequence(int count)
{
    for (int i = 0; i < count; i++)
    {
        yield return i;
    }
}

After compilation, my method ended up looking like this:

private static IEnumerable<int> GetSimpleSequence(int count)
{
    <GetSimpleSequence>d__0 d__ = new <GetSimpleSequence>d__0(-2);
    d__.<>3__count = count;
    return d__;
}

We can see that the <>3__count field of the generated type represents the initial value for the "count" parameter – although my code happens not to change the parameter value, it’s important that it’s preserved for all iterators generated from the same sequence. The -2 argument to the constructor is to represent the state of the object: -2 is used to represent a sequence rather than an iterator.

I’m not going into all the details of the generated type in this post (I have an article which goes into some more details) but the important part is the GetEnumerator() method:

[DebuggerHidden]
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
    Test.<GetSimpleSequence>d__0 d__;
    if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId
        && (this.<>1__state == -2))
    {
        this.<>1__state = 0;
        d__ = this;
    }
    else
    {
        d__ = new Test.<GetSimpleSequence>d__0(0);
    }
    d__.count = this.<>3__count;
    return d__;
}

The line of "d__ = this;" is the crucial one for the purpose of this post: a sequence’s first iterator is itself.

We’ll look into the negative ramifications of this in a minute, but first let’s think about the upsides. Basically, it means that if we only need to iterate over the sequence once, we only need a single object. It’s reasonable to suspect that the "fetch and iterate once" scenario is an extremely common one, so it’s fair to optimize for it. I’m not sure it’s actually worth the extra complexity introduced though – especially when there can be a heavy penalty in some odd cases. I assume Microsoft has performed more performance testing of this than I have, but they may not have considered the specific issue mentioned in this post. Let’s look at it now.

The bulky iterator problem

I’ll present the problem using the context in which I discovered it: the LINQ "Distinct" operator. First I’ll demonstrate it using the LINQ to Objects implementation, then show a few fixes we can make if we use our own one.

Here’s the sample program we’ll be using throughout, sometimes with a few modifications:

using System;
using System.Collections.Generic;
using System.Linq;

class Test
{
    static void Main()
    {
        var source = Enumerable.Range(0, 1000000);
        var query = source.Distinct();
        
        ShowMemory("Before Count()");
        query.Count();
        ShowMemory("After Count()");
        GC.KeepAlive(query);
        ShowMemory("After query eligible for GC");
    }
    
    static void ShowMemory(string explanation)
    {
        long memory = GC.GetTotalMemory(true);
        Console.WriteLine("{0,-30} {1}",
                          explanation + ":",
                          memory);
    }
}

Obviously ShowMemory is just a diagnostic method. What we’re interested in is what happens in Main:

  • We create a source sequence. In this case it’s a range, but it could be anything that returns a lot of distinct elements.
  • We create a query by calling Distinct. So far, so good – we haven’t done anything with the range yet, or iterated over anything. We dump the total memory used.
  • We count the elements in the query results: this has to iterate over the query, and will dispose of the iterator afterwards. We dump the total memory used.
  • We call GC.KeepAlive simply to make sure that the query sequence itself isn’t eligible for garbage collection until this point. We no longer explicitly have a reference to the iterator used by Count, but we do have a reference to the sequence up until now.
  • We dump memory one final time… but because this occurs after the GC.KeepAlive call (and there are no later references to query) the sequence can be disposed.

Note that the GetTotalMemory(true) call performs a full GC each time. I dare say it’s not foolproof, but it gives a pretty reasonable indication of memory that can’t be collected yet.

Now in a sensible world, I would expect the memory to stay reasonably constant. The Count() method may well generate a lot of garbage as it iterates over the query, but that’s fine – it should all be transient, right?

Wrong. Here’s one sample set of results:

Before Count():                47296
After Count():                 16828480
After query eligible for GC:   51140

Until we allow the sequence to be garbage collected, we’re left with all the garbage which was associated with the iterator used by Count. "All the garbage" is a set of elements which have previously been returned – Distinct needs to remember them all to make sure it doesn’t return any duplicates.

The problem is the "optimization" performed by the C# compiler: the iterator used by Count() is the same object that "query" refers to, so it’s got a reference to that large set of integers… a set we no longer actually care about.

As soon as the sequence can be garbage collected, we’re back down to modest memory use – the difference between the first and third lines of output is probably due to the code being loaded and JIT compiled over the course of the program.

Note that if we extended our query to call Where, Select etc after Distinct, we’d have the same problem – each sequence in the chain would keep a reference to its own "source" sequence, and eventually you’d get back to the Distinct sequence and all its post-iteration garbage.

Okay, enough doom and gloom… how can we fix this?

Solution #1: undo the compiler optimization (caller code)

If we’re aware of the problem, we can sometimes fix it ourselves, even if we only own the "client" code. Our sample program can be fixed with a single line:

using (query.GetEnumerator());

Just put that anywhere before the call to Count(), and we’re in the clear. The compiler gives a warning due to a possibly-accidental empty block, mind you… why would we want to ask for a value only to dispose it?

If you remember, the GetEnumerator method only returns "this" the first time you call it. So we’re just creating that iterating and immediately disposing of it. None of the code that was originally within the Distinct implementation will be run, so we won’t generate any garbage.

Sure enough, the results bear out the theory:

Before Count():                47296
After Count():                 51228
After query eligible for GC:   51140

Unfortunately, this relies on two things:

  • Every caller has to know this is a potential problem. I’ve never heard anyone mention it before and I only discovered it myself yesterday. Note that it’s only likely to be a problem for sequence types which are generated by the C# compiler… how much do you want to be at the mercy of the implementation details of whatever you’re calling?
  • It only works if you can force GetEnumerator() to be called on the problematic sequence. If we add a call to Select(x => x) to the end of our query declaration, we’re back to square one – because calling GetEnumerator() on the Select sequence doesn’t force GetEnumerator() to be called on the Distinct sequence.

It’s an interesting fix, but basically not a good idea in the long run.

Solution #2: clean up in the iterator block (implementation code)

Obviously I can’t change LINQ to Objects, so at this point I’ll resort to a very simplistic custom implementation of Distinct. I’m not going to bother about the overload with a custom equality comparer, and I won’t even perform argument validation – although I will split the implementation into two methods as we’d have to for normal argument validation to work eagerly. That will be useful later. Here’s the code in its broken state:

public static class CustomEnumerable
{
    public static IEnumerable<TSource> Distinct<TSource>(
        this IEnumerable<TSource> source)
    {
        // Argument validation elided
        return DistinctImpl(source);
    }
    
    private static IEnumerable<TSource> DistinctImpl<TSource>(
        IEnumerable<TSource> source)
    {
        HashSet<TSource> seen = new HashSet<TSource>();
        foreach (TSource item in source)
        {
            if (seen.Add(item))
            {
                yield return item;
            }
        }
    }
}

Readers who’ve been following my Edulinq series should be pretty familiar with this. It suffers from the same problem as the LINQ to Objects implementation, which is what I’d expect. Now let’s do something really horrible – set a "local" variable to null when we know it will no longer be referenced within the method:

private static IEnumerable<TSource> DistinctImpl<TSource>(
    IEnumerable<TSource> source)
{
    HashSet<TSource> seen = new HashSet<TSource>();
    try
    {
        foreach (TSource item in source)
        {
            if (seen.Add(item))
            {
                yield return item;
            }
        }
    }
    finally
    {
        seen = null;
    }
}

This is horrible code. If I hadn’t come across this problem, and someone presented me with the above code in a code review, I would by sighing and shaking my head. Setting variables to null at the end of the method for the sake of garbage collection normally shows a hideous misunderstanding of how variables and the GC work.

In this case, it fixes the problem… but at the cost of the ability to look our fellow developers in the eye.

Solution #3: Side-step the compiler optimization (implementation code)

The problem we’re facing stems from the compiler optimization used when an iterator block returns a sequence type. What if it only returns an iterator? Well, we need something to implement IEnumerable<T>, so how about we create a type which just delegates the responsibility:

public sealed class DelegatingSequence<T> : IEnumerable<T>
{
    private readonly Func<IEnumerator<T>> func;
    
    public DelegatingSequence(Func<IEnumerator<T>> func)
    {
        this.func = func;
    }
    
    public IEnumerator<T> GetEnumerator()
    {
        return func();
    }
    
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Now we know that the sequence will be separate from the iterator. Of course we could be given a delegate which does the wrong thing, but it’s easy to use correctly. Here’s the corresponding code for Distinct:

public static IEnumerable<TSource> Distinct<TSource>(
    this IEnumerable<TSource> source)
{
    // Argument validation elided
    return new DelegatingSequence<TSource>(() => DistinctImpl(source));
}
    
private static IEnumerator<TSource> DistinctImpl<TSource>(
    IEnumerable<TSource> source)
{
    // Original code without try/finally
}

This now solves the "bulky iterator problem", and it’s a pretty easy fix to propagate wherever it’s necessary… but again, only if you’re aware of the issue to begin with.

Solution #4: Make the optimization optional (compiler code)

There are already a few attributes which change the generated IL: we could add another one to suppress this optimization, and apply it to any iterator blocks we want to behave differently:

[AliasSequenceAndIterator(false)]
private static IEnumerable<TSource> DistinctImpl<TSource>(
    IEnumerable<TSource> source)

I very much doubt that this will make it into the C# compiler any time soon – and frankly I’m not too happy with it anyway, especially as it still requires anyone using an iterator block to be aware of the issue.

Solution #5: Make the compiler perform solution #2 automatically (compiler code)

The compiler knows what "local variables" we’re using. It’s already implementing a Dispose method for various reasons, including setting the iterator state appropriately. Why can’t it just set all the fields associated with local variables from the source code to their default values as part of disposal?

There’s one reason I can think of, although it would come up rarely: the variables might be captured by anonymous functions within the iterator block. The delegates created from those anonymous functions may still be alive somewhere. Fortunately, the compiler is presumably aware of which variables have been captured. It could simply have a policy of turning off the optimization completely for iterator blocks which capture variables in anonymous functions, and clearing the appropriate variables for all the other situations.

At the same time, the generated code could be changed to clear the field which underlies the Current property – at the moment, if you ask an auto-generated iterator for Current after you’ve finished iterating over it, the result will be the last value yielded. This could be another subtle pseudo-leak, unlikely as it is. (Aside from anything else, it’s just not neat at the moment.)

Solution #6: Just disable the optimization completely (compiler code)

I’d love to see the data on which Microsoft based the decision to optimize the generated code this way. I have confidence that there is such data somewhere – although I wonder whether it has been updated since the advent of LINQ, which I suspect is one of the heaviest users of autogenerated iterators. It’s quite possible that this is still a "win" in most cases – just like the dodgy-sounding mutable struct iterator returned by the public List<T>.GetEnumerator is a win in very many cases, and only causes issues if you’re being awkward. I think it’s worth a re-evaluation though :)

If the optimization were removed, there would be two options:

  • Stick with a single generated type impementing both the sequence and iterator interfaces, but always create a new instance for iterators
  • Create one sequence type and one iterator type

Personally the second sounds cleaner to me – and it means that each instance would only need the fields relevant to it. But cleanliness isn’t terribly important for generated code, so long as it doesn’t have significant issues, so I wouldn’t be too upset if the team took the first approach. I expect that in either case it would simplify the code – there’d be no need to remember the thread a sequence was created on, and so on.

Conclusion

The very fact that I’ve never heard of this problem before suggests that it’s not very serious – but it feels serious. While it’s possibly rare to hold onto a query for a long time, it ought to be reasonable to do so without incurring a memory penalty related to the size of the query the first time it was executed. This doesn’t affect all query operators, of course – and I haven’t investigated which ones it does affect. The ones I’d look at are those which have to buffer their data for some reason – including ones which operate on two sequences, buffering one and streaming the other.

Out of all the possible solutions, I would personally favour 5 or 6. This really ought to be a compiler fix – it’s unreasonable to suggest that all developers should become aware of this idiosyncrasy and work round it. The part of me which longs for simplicity prefers solution 6; in reality I suspect that 5 is more likely. It’s quite possible that the C# team will decide that it’s a sufficiently harmless situation that they won’t fix it at all – and I’m not going to claim to know their product and priorities better than they do. (Obviously I’m making them aware of my findings, but not in a fix-this-or-else way.)

Whatever happens, it’s been a real blast investigating this… I never get tired of learning more about the details of source transformations and their (sometimes inadvertent) side-effects.

14 thoughts on “Gotcha around iterator blocks”

  1. I never knew this optimization existed. Using the same type for the sequence and iterator seems silly, but makes sense performance-wise. I suggest that in most cases, this wouldn’t be a performance problem.

    This made me think of the odd memory consumption in a case, which I found is quite common. You need an instance of each of your (whatever) classes, and you write your iterator as such:

    public static IEnumerableSimple() {
    yield return new One();
    yield return new Two();
    yield return new Three();
    yield return new Four();
    }

    The iterator here has low memory consumption – just two ints and an object.

    However, if you were to write it as such:

    public static IEnumerableWithVars() {
    One one = new One();
    yield return one;

    Two two = new Two();
    yield return two;

    Three three = new Three();
    yield return three;

    Four four = new Four();
    yield return four;
    }

    suddenly the sequence needs four more variables, which it will never use. Should the classes themselves have high memory consumption, this could be a big problem.

    Of course, it would be due to bad design my your programmers, but this seemingly leaky memory would be very hard to find…

    (Luckily, we’re going to get a Remove Variable in Visual Studio soon :)

  2. I would suggest using liveness analysis to insert nulling for reference types. Also, it wouldn’t help Distinct(), but only “memberizing” locals with value lifetimes (write->read) spanning a yield would help @configurator’s example.

  3. Hi Jon,

    I just tried that exact same code in LinqPad and the results I got were

    Before Count(): 178313688
    After Count(): 178559976
    After query eligible for GC: 178289756

    The difference between the numbers isn’t as great as yours.

    Are you sure this is not based on some localised factors?

  4. @Vince: I suspect it’s rather more likely that LINQPad is doing something strange. After all, I’m just building a simple console app and running it – LINQPad is compiling “live”, possibly with some extra instrumentation involved… any number of things could be going on.

    Plus my results make perfect sense looking at the output from the compiler :)

  5. IMHO, the key is the relative low negative impact of this design choice, which you do mention in your article, in balance with what probably is a significant win in most cases.

    Specifically, avoiding the extra object instance in the most common cases probably is, as you suspect, a documented positive outcome.

    At the same time, the number of iterator methods that generate large intermediate data structures, as the Distinct() method must, is already relatively small. Then, the most common case is for instances of iterators created by such methods is very likely for the object to be very short-lived.

    I would say that in general, for managed code, while one should not be completely wasteful with respect to memory, it makes more sense to ignore memory issues until they manifest themselves as problems. For this iteration to work at all, you need enough RAM to fit the Set. The only real problem is if you’re already so short on RAM and are trying to keep the object around for a long time (an even less likely combination of conditions), in which case it’s not likely to be a real hardship to track down and resolve the memory issue.

    I’m very grateful for your discussion here about the issue, as it’s not something I’d really realized would be a potential problem (even though I knew about the object-sharing strategy of the iterator method re-write). But at the same time, it’s such a remote corner-case that I’d be surprised if it’s worth Microsoft’s time to worry too much about it.

  6. I’m a little confused as to why setting the local to null in the finally{} had any effect – (a) doesn’t the compiler already know when the local is never referenced again w/in the fn? (b) there’s no code after the assignment but the implicit return, so why would it have any effect even if simple scoping were used for “seen”‘s lifetime?

  7. @MBR: Because it’s not *really* a local variable. It’s an instance variable, and thus unaffected by the scope of the method. It stays as a “live” reference as long as the instance itself is live.

  8. I have known about this “optimization” for a long time, and it has always surprised me that they did this.

    1) Basically they optimized away one memory allocation, but introduced additional complexity. Your solution 6 option 2 would be the straightforward implementation, and is much simpler to get it right.

    2) memory allocations are extremely fast. Is it really worth it?

    3) by optimizing away one memory allocation, they also “optimize” away one opportunity to garbage collect, as your blog post clearly demonstrates. I’ve always known about it, and never understood why they wanted to take the risk.

    4) the optimization is leaky. It can be observed by trying to cast the sequence to an iterator. very ugly.

    I often use a private nested class that implements IEnumerable. It has a constructor in which I store the parameters (and do whatever eager work I want to do). And I use an iterator block to implement the GetEnumerator method. It’s like your solution 3, but without the generic class and the delegate closure. That may be one or two memory allocations in itself!

    This effectively solves the problem you describe (for code I write), is not a lot of code, very readable, and doesn’t have the uglyness of your “solution” 2.

    Conclusion: my vote goes to solution 6 option 2.

  9. Have you posted this on Connect? It’s a long shot, but if enough people vote for it, it might get fixed.

  10. I think the main motivation behind this is that if you’re not obcessively calling on the garbage collector to reclaim memory, it’s unlikely for the memory to be collected there anyway.

    I’ve tested this by wediging in the “using (query.GetEnumerator());”, but passing false to GetTotalMemory, and as expected, the memory sticks around because the GC doesn’t bother itself to collect either set right there and then.

    Seems like this optimization is just letting the GC do its job. The fetch-and-iterate-once scenario is much more likely than it being important that the GC kicks in at right that instant. It’s much more likely for the GC to naturally kick in at some point when you’re no longer using either the sequence or the iterator and collect everything in one swoop.

  11. @Richard: Better than that, I pinged the C# team, so they’re definitely aware of it now :)

    @Ilia: But the point is that with LINQ and deferred execution, we’re somewhat encouraged to keep the *sequence* around for a long time: “it’s not the data, it’s just the query” – right? In a sensible world, we could keep a query around for ages, and it would have no significant impact on memory other than keeping the underlying source alive. With the current “optimization” it also ends up taking the “temporary” storage for the first iterator.

  12. @skeet: Yes, I agree. I too would like to see the motivation for making it like this.

    I’m guessing it seems like reasonable design to reuse the instance of the class you already instantiated. Personally, my choice would have been to generate the enumerator as an inner class of the enumerable and return a new instance whenever GetEnumerator is called. This way the enumerable is just a custom factory, you can keep it around as long as you want, and it generally feels cleaner than to reuse the same class for both things.

  13. @Ilia: It would definitely be *cleaner*. It would just be slightly less efficient than the single class for the common case of “fetch sequence, fetch iterator from sequence, iterate, discard both sequence and iterator”. That’s the use case being optimized for, at the (probably accidental) expense of the “keep hold of the sequence but not the iterator” case discussed here.

Comments are closed.