Are iterators fundamentally flawed ?

It seems apparent that with computers as we currently know them, processors are now set to scale out not up.. that is, clock speeds aren’t rapidly growing, and certainly not doubling every year or two, instead the number of processors on a chip is.  As a case in evidence, 2 or 3 years ago, my PC’s were all single CPU, then a bit over a year ago I bought my new desktop PC and it was a dual core.  And as of a few days ago I swapped that dual core out for a quad core.  For me, the number of cores/CPU’s is clearly doubling every year or two, but clock speed itself isn’t that much faster.

So what that means for computing is the emphasis is on parallel or concurrent computing. As such, we need to look at the fundamentals and see if they are designed for the task.  Iterators don’t look like they are.

If you consider an iterator primary task of getting the next element, then look at how it does it, you should observe a major design flaw… the operation is not atomic. An iterator’s implementation is IEnumerator or IEnumerator(Of T), and IEnumerator require you to first call MoveNext then read the Current property.  If you allow multiple threads to use the same IEnumerator, then you could get a call sequence such as MoveNext, MoveNext, Current, Current, which would skip one item and repeat the next one.  So IEnumerator is not well designed for threading.  This is a known design limitation, but rather than address that, languages like C# have implemented their iterators to be a different iterator per thread.  That is they not only recognise the limitation but they also enforce it.

Going forward, perhaps we need a new IEnumerator class.  In the good bad old days that would be called IEnumerator2(Of T).  IEnumerator2 would make iterating a single method call that would return a Tuple(Of T, Boolean).  It could extend IEnumerator, but it would be difficult to enforce thread safety to the IEnumerator interface. The problem that language like C# would then face is do they risk breaking existing code by returning the same enumerator to different threads ?  Well no they would have to provide an IEnumerable2.GetEnumerator as well, so it would be possible to have the same class used for both single instance per thread and shared class across threads.  As such, then the code that calls the iteration, the For Each loop, would then have to indicate whether it wants to do so using parallel or single threaded.. this would probably surface in the languages such as the For Parallel instruction on a loop.

So maybe iterators aren’t fundamentally flawed, they just currently have a lot of limitations which can and hopefully will be addressed  as we venture more into the world of multi core 🙂

4 Comments so far

  1.   gbarnett on August 2nd, 2008          

    In Thread Building Blocks (TBB, by Intel for C++) you can use iterators to tell TBB how to split something up.

    I’ve not looked at it yet, but have you looked at the TPL and its classes using Reflector? It might shed some light on the subject.

  2.   bill on August 5th, 2008          

    Hi Granville,

    I haven’t looked at it yet either. I suspect they would call GetEnumerator, and then work with the enumerator, sharing that across threads. that’d of course be contrary to using the language constructs in C# 😉

  3.   Markus Kohler on August 5th, 2008          

    I agree iterators are somehow flawed.
    They are used because writing control flow operations cannot be done properly in languages that do not support closures.
    Fortunately support for closures fixes is under way in most major programming languages.


  4.   Tetsuo on August 5th, 2008          

    Iterators, objects and threads are not broken or flawed. They just weren’t designed to be optimal in a parallel processing environment.

    OO languages (at least mainstream ones) pass object references, objects have state, and are mutable ‘by default’ (in contrast of pass-by-copy semantics in Pascal or C, in which even if you can change a struct property, it won’t reflect onto the original one). STATE is what makes difficult programming in a multi-core environment, not common constructs in mainstream languages.

    And that is what makes languages like Scala, which uses the concept of Actors and message-passing, more multi-core-friendly, not closures.