LINQ To Objects and the performance of nested "Where" calls

This post came out of this Stack Overflow question, which essentially boils down to which is better out of these two options:

var oneBigPredicate = collection.Where(x => Condition1(x)
                                         && Condition2(x)
                                         && Condition3(x));

var multiplePredicates = collection.Where(x => Condition1(x))
                                   .Where(x => Condition2(x))
                                   .Where(x => Condition3(x))

The first case is logically a single "wrapper" sequence around the original collection, with a filter which checks all three conditions (but applying short-circuiting logic, of course) before the wrapper will yield an item from the original sequence.

The second case is logically a set of concentric wrapper sequences, each applying a filter which checks a single condition. Asking the "outer" wrapper for the next item involves that one asking the "middle" wrapper for the next item, which asks the "inner" wrapper for the next item, which asks the original collection for the next item… the item is then passed "outwards" through the wrappers, with the filters being applied as we go, of course.

Now the two will achieve the same result, and I should say up-front that in most realistic cases, it won’t make a significant difference which you use. But realistic cases aren’t nearly as interesting to investigate as pathological cases, so I decided to benchmark a few different options for the second case. In particular, I wanted to find out how long it took to iterate over a query in the cases where the condition was either "always true" or "always false" – and vary the depth of nesting. Note that I’m not actually testing the first kind of query shown above… I suspect it wouldn’t be terribly interesting, at least compared with the results of the second query.

The simplest way of creating a "collection" of a large size is to use Enumerable.Repeat(), and the simplest way of iterating over the whole sequence is to just call Count() on it… so that’s what I do, timing how long the Count() call takes. (I don’t actually print out the results of Count(), but with an "always false" predicate I’ll get 0, and with an "always true" predicate I’ll get the size of the input collection.)

Here’s the sample code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Test
{
    static void Main()
    {
        int size = 10000000;
        Console.WriteLine("Always false");
        RunTests(size, x => false);
        
        Console.WriteLine("Always true");
        RunTests(size, x => true);
        
    }
    
    static void RunTests(int size, Func<string, bool> predicate)
    {
        for (int i = 1; i <= 10; i++)
        {
            RunTest(i, size, predicate);
        }
    }
    
    static void RunTest(int depth, int size, Func<string, bool> predicate)
    {
        IEnumerable<string> input = Enumerable.Repeat("value", size);
        
        for (int i = 0; i < depth; i++)
        {
            input = input.Where(predicate);
        }
        
        Stopwatch sw = Stopwatch.StartNew();
        input.Count();
        sw.Stop();
        Console.WriteLine("Depth: {0} Size: {1} Time: {2}ms",
                          depth, size, sw.ElapsedMilliseconds);
    }
}

You may notice there’s no JIT warm-up here – but I’ve tried that, and it doesn’t alter the results much. Likewise I’ve also tried with a larger size of collection, and the trend is the same – and the trend is the interesting bit.

Time to guess the results…

Before I include the results, I’ll explain what I thought would happen. I had the mental model I described before, with multiple sequences feeding each other.

When the condition is always false, I’d expected the call to MoveNext() from Count() to get all the way to the innermost filtering sequence, which would then iterate over all of the input collection, applying the filter on each item and never yielding a result. That innermost MoveNext() call returns false, and that propagates right back to Count(). All the (significant) time is spent in the innermost loop, testing and rejecting items. That shouldn’t depend on the depth of the nesting we’ve got, right? I’d expect it to be linear in terms of the size of the collection, but constant in terms of nesting. We’ll see.

When the condition is always true, we’re in a much worse situation. We need to propagate every item from the collection through all the filters, out to Count(), checking the condition and accepting the item on each check. Each MoveNext() call has to go all the way to the innermost filter, then the result has to propagate out again. That sounds like it should be roughly linear in the depth of the nesting, as well as still being linear in the size of the collection – assuming a very simplistic execution model, of course.

Before you look at the results, check that you understand my logic, and see if you can think why it might not be the case.

The actual results

Here’s what I’ve actually observed, with all times in milliseconds. The size of the collection is constant – we’re only varying the depth

Depth Time for "always false" Time for "always true"
1 182 305
2 219 376
3 246 452
4 305 548
5 350 650
6 488 709
7 480 795
8 526 880
9 583 996
10 882 1849

There are a few things to note here:

  • The "always true" time is going up broadly linearly, as we’d expected
  • The "always false" time is also going going up broadly linearly, even though we’d expected it to be roughly constant
  • The "always false" time is still significantly better than the "always true" time, which is somewhat reassuring
  • The performance at a depth of 10 is significantly worse than at a depth of 9 in both cases

Okay, so that’s bizarre. What can possibly be making the time taken by the "inner" filtering sequence go up with the depth of the nesting? It shouldn’t know about the rest of the calls to Where – that’s not its job. Time to investigate…

Reimplementing Where

As you may have seen before, Where isn’t very hard to implement – at least it’s not hard to implement if you’re not trying to be clever. In order to mess around with things to check that my mental model was correct, I decided to run exactly the same tests again, but against the Edulinq implementation. This time, the results are very different:

Depth Time for "always false" Time for "always true"
1 232 502
2 235 950
3 256 1946
4 229 2571
5 227 3103
6 226 3535
7 226 3901
8 232 4365
9 229 4838
10 226 5219

Well look at that… suddenly our "always false" query has the expected characteristics. The "always true" query is basically linear except for the jump in time taken between a depth of 2 and a depth of 3. This may well be the same sort of discontinuity present in the earlier results, but at a different depth. (I’ll do more research on that another time, I think.)

So if the naive implementation actually works better in some cases, what’s going wrong in the "always false" case in the real LINQ to Objects? We can work this out by taking a stack trace from a predicate. Here’s a sample query which will throw an exception:

var query = Enumerable.Repeat("value", 1)
                      .Where(x => { throw new Exception("Bang!"); })
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true)
                      .Where(x => true);

When you try to count that query (or do anything else which will iterate over it) you get a stack trace like this:

Unhandled Exception: System.Exception: Bang!
   at Test.<Main>b__0(String x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.<>c__DisplayClassf`1.<CombinePredicates>b__e(TSource x)
   at System.Linq.Enumerable.WhereEnumerableIterator`1.MoveNext()
   at System.Linq.Enumerable.Count[TSource](IEnumerable`1 source)
   at Test.Main()

Ooh, look at that stack – we’ve got 8 Where clauses, and 7 levels of DisplayClassf`1 – which looks like a generated class, perhaps for a lambda expression.

Between this and a helpful email from Eric Lippert, we can basically work out what’s going on. LINQ to Objects knows that it can combine two Where clauses by just constructing a compound filter. Just for kicks, let’s do the same thing – almost certainly more simplistically than LINQ to Objects – but in a way that will give the right idea:

// We could have an interface for this, of course.
public class FilteredEnumerable<T> : IEnumerable<T>
{
    private readonly IEnumerable<T> source;
    private readonly Func<T, bool> predicate;
    
    internal FilteredEnumerable(IEnumerable<T> source,
                                Func<T, bool> predicate)
    {
        this.source = source;
        this.predicate = predicate;
    }
    
    public IEnumerator<T> GetEnumerator()
    {
        foreach (T item in source)
        {
            if (predicate(item))
            {
                yield return item;
            }
        }
    }
    
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    
    public FilteredEnumerable<T> Where(Func<T, bool> extraPredicate)
    {
        return new FilteredEnumerable<T>(source,
                                         x => this.predicate(x) && extraPredicate(x));
    }
}

public static class Extensions
{
    public static IEnumerable<T> Where<T>(this IEnumerable<T> source,
                                          Func<T, bool> predicate)
    {
        var filtered = source as FilteredEnumerable<T>;
        return filtered == null ? new FilteredEnumerable<T>(source, predicate)
                                : filtered.Where(predicate);
    }
}

Let’s run the same tests as before, and see how we do…

Depth Time for "always false" Time for "always true"
1 240 504
2 275 605
3 332 706
4 430 797
5 525 892
6 558 980
7 638 1085
8 715 1208
9 802 1343
10 1312 2768

Oh look – it feels like the real LINQ to Objects implementation. A bit slower, certainly, but that’s okay. The way it trends is the same. In particular, it’s slower than the naive implementation for the "always false" filter, but faster for the "always true" filter.

Could things be improved?

The problem here is the creation of nested delegates. We end up with a large stack (which appears to be causing a bigger problem when it reaches a depth of 10) when really we want to just build another delegate.

The thought occurs that we could potentially use expression trees to do this. Not in a signature-compatible way with LINQ to Objects, but we should be able to combine (a => a == 3) and (b => b != 10) into (x => x == 3 && x != 10) effectively. Then when we’re asked to iterate, we just need to compile the expression tree to a delegate, and filter using that single, efficient delegate.

There are three problems with this:

  • It goes against the normal approach of LINQ to Objects, using delegates instead of expression trees. Heck, with expression trees throughout we could do all kinds of interesting optimizations.
  • Creating and compiling the expression tree may be more expensive than the iteration – we don’t really know. It depends on how large the collection is, etc.
  • It’s somewhat complicated to implement, because we need to rewrite the constituent expressions; note how "a" and "b" both become "x" in the example above. This is the main reason I haven’t actually bothered trying this in order to benchmark it :)

There are various other options to do with building a more efficient way of evaluating the predicates. One pretty simple (but repetitive) solution is to have one class per number of predicates, with a field per predicate – FilteredEnumerable1, FilteredEnumerable2 etc. When you get bored (e.g. FilteredEnumberable9) you construct any further ones by combining predicates as per the LINQ to Objects approach. For example, here’s an implementation of FilteredEnumerable3:

public class FilteredEnumerable3<T> : IFilteredEnumerable<T>
{
    private readonly IEnumerable<T> source;
    private readonly Func<T, bool> predicate0;
    private readonly Func<T, bool> predicate1;
    private readonly Func<T, bool> predicate2;
    
    internal FilteredEnumerable3(IEnumerable<T> source,
                                 Func<T, bool> predicate0,
                                 Func<T, bool> predicate1,
                                 Func<T, bool> predicate2)
    {
        this.source = source;
        this.predicate0 = predicate0;
        this.predicate1 = predicate1;
        this.predicate2 = predicate2;
    }
    
    public IEnumerator<T> GetEnumerator()
    {
        foreach (T item in source)
        {
            if (predicate0(item) &&
                predicate1(item) &&
                predicate2(item))
            {
                yield return item;
            }
        }
    }
    
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    
    public IFilteredEnumerable<T> Where(Func<T, bool> extraPredicate)
    {
        return new FilteredEnumerable4<T>(source,
            predicate0,
            predicate1,
            predicate2,
            extraPredicate);
    }
}

In this case, I’m rather pleased with the results:

Depth Time for "always false" Time for "always true"
1 237 504
2 231 551
3 231 599
4 225 625
5 228 703
6 224 787
7 222 876
8 225 966
9 232 1069
10 219 1191

Now that’s more like it. Other than a few cells, we’re actually outperforming LINQ to Objects – and still compatible with it. The only problem is that the cells where it’s slower are the common ones – the top few in the right hand column. And this was just going up to FilteredEnumerable4<T>.

I honestly don’t know why my FilteredEnumerable1<T> is slower than whatever’s in LINQ to Objects. But it’s nice to see the rest going quickly… I suspect there are some downsides as well, and basically I trust the team to make the right decisions for normal cases.

And there’s more…

You may be surprised to hear I’ve kept things simple here – as Eric mentioned to me, it’s not just Where and Select that can be transformed in this way. Select works as well – you can combine projections together pretty easily. So imagine that we’ve effectively got this transformation:

// Normal LINQ query
var query = list.Where(x => Condition1(x))
                .Where(x => Condition2(x))
                .Select(x => Projection1(x))
                .Select(y => Projection2(y));

// After optimization
var query = list.WhereSelect(x => Condition1(x) && Condition2(x),
                             x => Projection2(Projection1(x));

Of course at that point, my FilteredEnumerableX<T> approach becomes even more unwieldy as you have two axes – the number of predicates and the number of projections.

Conclusion

As is so often the case with performance, there’s more to Where than there first appears. When I started benchmarking this I had no idea what I was getting myself into. The great thing is that very few people would ever need to know this. It’s all hidden by the abstraction.

Still, isn’t this sort of thing fun?

18 thoughts on “LINQ To Objects and the performance of nested "Where" calls”

  1. How about an implementation with a list of predicates?

    public static class EnumerableEx {
    public static IEnumerable Where(this IEnumerable source, Func predicate) {
    WhereEnumerable
    oldWhere = source as WhereEnumerable;
    if (oldWhere != null) {
    return new WhereEnumerable
    (oldWhere.source, oldWhere.predicates.And(predicate));
    }

    return new WhereEnumerable(source, new ImmutableCollection>(new[] { predicate }));
    }

    private class WhereEnumerable : IEnumerable {
    public readonly IEnumerable
    source;
    public readonly ImmutableCollection> predicates;

    public WhereEnumerable(IEnumerable source, ImmutableCollection> predicates) {
    this.source = source;
    this.predicates = predicates;
    }

    public IEnumerator GetEnumerator() {
    foreach (T item in source) {
    foreach (Func
    predicate in predicates) {
    if (!predicate(item)) {
    goto continueOuter;
    }
    }

    yield return item;
    continueOuter:
    ;
    }
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
    return GetEnumerator();
    }
    }
    }

    ImmutableCollection is simply an array wrapped in an enumerable like this:

    class ImmutableCollection : IEnumerable{
    private T[] items;

    public ImmutableCollection() {
    this.items = new T[0];
    }

    public ImmutableCollection(IEnumerable items) {
    this.items = items.ToArray();
    }

    public ImmutableCollection And(T item) {
    return new ImmutableCollection
    (this.Concat(items));
    }

    public IEnumerator GetEnumerator() {
    foreach (T item in items) {
    yield return item;
    }
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
    return GetEnumerator();
    }
    }

    Wouldn’t this be a (very) small penalty on the single Where case for a large gain on multiple Wheres?

  2. @configurator: That implementation requires an iterator to be created on *every* iteration, in order to iterate over the predicates. Exposing the array would improve that, but I think it would still be more of a hit than the “hard-coded” nature of my FilteredEnumerableX.

  3. Re expressions: how does AsQueryable() behave? It could potentially already implement the optimizations you are talking about.

  4. @Vladimir: Yes, there are definitely ways of doing it. However, I believe your current code will iterate in the wrong order – it will evaluate the *last specified* predicate first. You need to build it up in that order for the sake of immutability, but then reverse it (once) just before you start looping over the source, I believe.

  5. > It’s somewhat complicated to implement, because we need
    > to rewrite the constituent expressions; note how “a” and “b”
    > both become “x” in the example above. This is the main
    > reason I haven’t actually bothered trying this in order
    > to benchmark it :)

    Actually it’s easier than it seems, you just need an expression visitor that replaces a parameter with another
    http://pastebin.com/TT5v3Y87

  6. This really is interesting. Which means that, for a LINQ query in which the first filter gets rid of most of the items, it is always better to write in a single lambda than to use nested Where’s.

    I sometimes have things like collection.Where(x => x.Type == abc).Where(x => …)… and the types are many, so perhaps only 5% of the collection gets selected. In such cases, based on your observations, it might be better to write collection.Where(x => x.Type == abc && …)…

  7. Nice! Great investigative reporting! :)

    Something implicit in all this, and which bears calling out: the implementation-dependent nature of these methods is a good example of why one should keep one’s predicates side-effect-free.

    The naive implementation will execute only one predicate once for each of the original enumerable elements; the other predicates don’t get executed at all, never having anything to test against. But the optimized implementation _might_ execute all combined predicates once for each element. Of course, it also might not…a valid implementation would be for the predicates to be combined using shortcutting; but even in that case, there’s no guarantee of _which_ predicate is the one that’s executed, because that would be order-dependent.

    The bottom line being, the implementation is not part of the API, and can affect observed behavior if one has side-effects in the predicates. It is better to stick with the functional nature of LINQ and avoid those dependencies on the implementation details.

  8. @pete.d: While I agree that queries should be side-effect free, I think that keeping the ordering of side-effects is at least *implicit* in the documentation. I would be surprised and dismayed at an “optimization” which changed the order in which the conditions are applied. Don’t forget that side-effects can be somewhat subtle… you’re suggesting that this query is invalid:

    var query = people.Where(p => p.Parent != null)
    .Where(p => p.Parent.Age == 10);

    If the conditions could be reordered, you could get a NullReferenceException for the second condition, with an entry with a null parent. I think that sort of thing would be *highly* undesirable.

  9. You highlight three issues with replacing the delegate based operations with expressions tree based operations.
    I believe, however, there may be good grounds to investigate the expression tree approach futher:
    * Most API users wouldn’t notice the difference (I’d wager most of those using Linq20bjects hadn’t even realised the methods are delegates rather than expressions)
    * Yes, it’s complicated, but some other Linq providers do this kind of thing anyway (some of the SQL queries from Linq2SQL are a lot more elegant to DBAs than the procedural looking code that generates them)

    I guess the major drawback would be any performance penalty – but that’s what profilers and optimisation sessions are for ;)

  10. “If the conditions could be reordered, you could get a NullReferenceException for the second condition, with an entry with a null parent. I think that sort of thing would be *highly* undesirable.”

    Yes, that’s true. And on second thought, I agree that the docs _do_ in fact essentially guarantee this, since chained calls to Where() are acting not on the original enumeration, but on the enumeration returned by previous calls to Where(). Reordering or failing to shortcut would be tantamount to changing the semantic meaning of the original code.

    There are other reasons to avoid side-effects in query lambdas, but at the moment I agree that I was mistaken to suggest this particular optimization leads to that conclusion. (Maybe I should stop trying to keep up with your blog in the late evening hours…I seem to be overlooking a number of things lately :( ).

  11. This is definitely an eye-opening article. I followed your Edulinq series (which is now required reading among the .NET developers in my office now) and have been following your Eduasync series.

    At first I was kind of disappointed that there wasn’t more Eduasync posted when I came across this article, but I must say… I am deeply pleased and very thankful that you wrote this. It really shows off the cleverness of the people behind LINQ to Objects and it all makes sense.

    I would have simply taken the naive approach to doing Where() and just kept on going, but this has helped broaden my understanding and will help me think more this way in the future.

  12. @Joshua: Glad you enjoyed this one, but fear not – there’s more Eduasync on the way. I just wanted to write this up before I forgot about it.

  13. You could generate alle where/select combos up to 3×3 with a T4 template and the single combos up to 10.

  14. @Mark: Which is faster between which options, and in which situation? If you mean out of the options at the very start of the post, the first approach (a single Where clause) is faster – although the chances of it being significant are small.

Comments are closed.