Itertools for C# – Tee

Tee is a cool function that “clones” enumerators whatever their current state is – it basically allows you to branch off an enumerator into as many enumerators as you want, all independent of each other.

 You can do something like


List<int> list = new List<int> { 1, 2, 3, 4, 5 };

IEnumerator<int> iterable1 = list.GetEnumerator();

IEnumerator<int> []teedIterables = Itertools.Tee(iterable1, 2);

foreach(int val in teedIterables[0])
{
Console.WriteLine(val);
}

foreach(int val in teedIterables[1])
{
Console.WriteLine(val);
}

// prints 1 2 3 4 5 1 2 3 4 5



If there was a iterable1.MoveNext() before the call to Tee, then the output would have been 2 3 4 5 2 3 4 5.

The implementation proved to be pretty interesting – and complicated, partly because the C# compiler does not allow the yield statement inside anonymous methods. Here’s the implementation.


public static IEnumerator<T>[] Tee<T>(IEnumerator<T> source, int count)
{
List<T> tValues = new List<T>();
IEnumerator<T>[] results = new IEnumerator<T>[count];

for (int i = 0; i < count; ++i)
{
int currentLocation = 0;
results [ i ] = CreateEnumerator(source, new EnumeratorState<T>() { CurrentLocation = currentLocation, TValues = tValues });
}

return results;
}

private static IEnumerator<T> CreateEnumerator<T>(IEnumerator<T> source, EnumeratorState<T> state)
{
while (true)
{
if (state.CurrentLocation < state.TValues.Count)
{
yield return state.TValues[state.CurrentLocation++];
}
else
{
if (source.MoveNext())
{
state.TValues.Add(source.Current);
state.CurrentLocation = state.TValues.Count;
yield return source.Current;
}
else
{
break;
}
}
}
}

class EnumeratorState<T>
{
internal int CurrentLocation { get; set; }
internal List<T> TValues { get; set; }
}

 

There’s a fair bit of code involved, as you can see. The basic idea behind the implementation is to maintain a list as a backing store. Each teed off enumerator attempts to read from the list – if it hits the end of the list, it advances the original enumerator  and gets the current value, updates the list and returns that value. The rest of the code is plumbing to work around the inability to use yield inside an anonymous method. The code basically does what the compiler would have done – there’s the EnumeratorState class to hold captured variables and a new instance of the class is created and passed to CreateEnumerator, which is in the place of the anonymous method.

Tee is that kind of function that is incredibly useful once you know you can do such a thing. Let’s say you have an application that needs to process lines in a file, but the processing can happen in different contexts, independent of each other. Take for example, a log parser, that goes through each line in a file, looking for the start of an operation. The log parser notifies another component once it finds the start of the operation and proceeds through the file looking for the next operation. Normally, you’ll solve the problem by adding the component to a list that gets notified after the parser reads each line. With Tee, you can simply tee off an enumerator and pass it to the other component – both the log parser and the component now have independent enumerators and can actually run in parallel. And you can tee off as many enumerators as there are components.

With that, we come to the end of this series of blog posts on implementing itertools in C#. The source code for all the functions implemented in the series is available at http://code.msdn.microsoft.com/itertools

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>