Fun with captured variables

 

I’m currently writing about readability, and discussing deferred parameter evaluation in LINQ on the C# newsgroup. It’s led me to come up with quite an evil application of captured variables.

Consider this C# 3 code:

using System;
using System.Linq;

public class Test
{
static void Main()
{
int max = int.MinValue;

int[] values = { 0, 5, 2, 3, 8, 4, -1, 10, 12, 8, 100 };

var query = from value in values
where value > max
select (max=value);

foreach (int value in query)
{
Console.WriteLine(value);
}
}
}

The max and values variables are easy enough to understand, and the foreach loop at the bottom just iterates through the query and dumps out the results – but what does the query itself do?


The answer is that it shows you all the values which are larger than any that have come before it. Each value encounters the “where” clause which is effectively “are you the biggest value I’ve seen so far?” and passing values then encounter the “select” clause which is “remember the current value as the highest we’ve seen, and return it”. So, the output is 0, 5, 8, 10, 12, 100.


It relies on the fact that the “select” clause for one passing value is executed before the next “where” clause. If you force it to evaluate the “where” clause for all the source values before performing the projection (e.g. with an “orderby” clause) then all the numbers are returned, because they’re all bigger than the initial value of max. If you try to use Parallel LINQ to evaluate it, all bets are off.


 Contrary to the normal advice, do try this at home. Just don’t try it in production code. (If you find a genuine use for it, at least comment it heavily!)

3 thoughts on “Fun with captured variables”

  1. genuine use for it? my guessi that it is probably a very slow pice of code for sorting, comapred to other merge sort? or quicksort?

    I think in the future the compiler itself will automatically create code to take advantage of parallel cpu’s so its probably a good example of code you should not write, buts its always good to know WHY.

  2. The only place where variables are ‘captured’ is in lambdas.

    take this query for example:
    int i = 5;
    var q = (from em in nw.Employees select em).Take(i);

    i++;

    foreach(var o in q)
    {
    //…
    }

    how many rows will there be fetched: 5 or 6? :) (5)

    So, one could argue: this is an extension method, but I can also write a query with solely extension methods. In that case, it’s still the case that the variables inside a lambda are captured, but variables passed to other extension methods are NOT.

    Example:
    string customerID = “CHOPS”;
    int amount = 5;

    var q = nw.Orders.Where(o=>o.CustomerID == customerID).Take(amount);

    //… some code
    customerID = “BLONP”;
    amount=10;

    foreach(var o in q)
    {
    // what will be fetched for this loop?
    }

    Answer: the first 5 orders of BLONP.

    Unclear? To me it is. The argument that it is a shortcut to change things inside the query based on variables doesn’t cut it now, as some extension methods wrap the input as ConstantExpression instances, while members in lambda’s are captured, because they’re wrapped inside MemberAccess expressions.

    What if I want to page through a query, with skip and take (crappy methods for paging, as they have limited scope in linq to sql for example due to the fact that they’re not really usable as a paging mechanism), I now can’t simply change the skip and take parameter, as that won’t work. Though I can change the filter’s parameter…

    Inconsequent design, so if you ask me, the capturing of the variable in the lambda unintentionally has the sideeffect that you can change they query’s filter, it doesnt look as if it’s been designed that way, otherwise why have other Queryable extension methods behave differently?

  3. No, variables are also captured in anonymous methods. This is the way it’s been ever since C# 2. Use List.FindAll or List.ConvertAll and you’ll see eaxctly the same behaviour.

    The use (or not) of extension methods is irrelevant – all that’s relevant is whether or not the parameter is an anonymous function (lambda expression or anonymous method).

    Now my question is: how often does this actually come up? How often do people really want to change variables after they’ve captured them? I’ve seen it when starting threads, but not usually outside that. I doubt that situations like the ones you’ve described will actually cause very many problems.

    Yes, people need educating about what happens (cue a plug for my book). But the way it works is consistent in my view, and nicer than the Java way of dealing with pseudo-closures in the form of anonymous classes.

    Jon

Comments are closed.