Eduasync part 14: Data passing in coroutines

(This post covers project 19 in the source code.)

Last time we looked at independent coroutines running in a round-robin fashion. This time we’ll keep the round-robin scheduling, but add in the idea of passing data from one coroutine to another. Each coroutine will act on data of the same type, which is necessary for the scheme to work when one coroutine could "drop out" of the chain by returning.

Designing the data flow

It took me a while to get to the stage where I was happy with the design of how data flowed around these coroutines. I knew I wanted a coordinator as before, and that it should have a Yield method taking the value to pass to the next coroutine and returning an awaitable which would provide the next value when it completed. The tricky part was working out what to do at the start of each method and the end. If the method just took a Coordinator parameter, we wouldn’t have anything to do with the value yielded by the first coroutine, because the second coroutine wouldn’t be ready to accept it yet. Likewise when a coroutine completed, we wouldn’t have another value to pass to the next coroutine.

Writing these dilemmas out in this post, the solution seems blindingly obvious of course: each coroutine should accept a data value on entry, and return one at the end. At any point where we transfer control, we provide a value and have a value which is required by something. The final twist is to make the coordinator’s Start method take an initial value and return the value returned by the last coroutine to complete.

So, that’s the theory… let’s look at the implementation.

Initialization

I’ve changed the coordinator to take all the coroutines as a constructor parameter (of the somewhat fearsome declaration "params Func<Coordinator<T>, T, Task<T>>[] coroutines") which means we don’t need to implement IEnumerable pointlessly any more.

This leads to a code skeleton of this form:

private static void Main(string[] args)
{
    var coordinator = new Coordinator<string>(FirstCoroutine,
                                              SecondCoroutine,
                                              ThirdCoroutine);
    string finalResult = coordinator.Start("m1");
    Console.WriteLine("Final result: {0}", finalResult);
}

private static async Task<string> FirstCoroutine(
    Coordinator<string> coordinator,
    string initialValue)
{
    …
}

// Same signature for SecondCoroutine and ThirdCoroutine

Last time we simply had a Queue<Action> internally in the coordinator as the actions to invoke. You might be expecting a Queue<Func<T, T>> this time – after all, we’re passing in data and returning data at each point. However, the mechanism for that data transfer is "out of band" so to speak. The only time we really "return" an item is when we reach the end of a coroutine. Usually we’ll be providing data to the next step using a method. Likewise the only time a coroutine is given data directly is in the first call – after that, it will have to fetch the value by calling GetResult() on the awaiter which it uses to yield control.

All of this is leading to a requirement for our constructor to convert each coroutine delegate into a simple Action. The trick is working out how to deal with the data flow. I’m going to include SupplyValue() and ConsumeValue() methods within the coordinator for the awaiter to use, so it’s just a case of calling those appropriately from our action. In particular:

  • When the action is called, it should consume the current value.
  • It should then call the coroutine passing in the coordinator ("this") and the initial value.
  • When the task returned by the coroutine has completed, the result of that task should be used to supply a new value.

The only tricky part here is the last bullet – and it’s not that hard really, so long as we remember that we’re absolutely not trying to start any new threads. We just want to hook onto the end of the task, getting a chance to supply the value before the next coroutine tries to pick it up. We can do that using Task.ContinueWith, but passing in TaskContinuationOptions.ExecuteSynchronously so that we use the same thread that the task completes on to execute the continuation.

At this point we can implement the initialization part of the coordinator, assuming the presence of SupplyValue() and ConsumeValue():

public sealed class Coordinator<T>
{
    private readonly Queue<Action> actions;
    private readonly Awaitable awaitable;

    public Coordinator(params Func<Coordinator<T>, T, Task<T>>[] coroutines)
    {
        // We can’t refer to "this" in the variable initializer. We can use
        // the same awaitable for all yield calls.
        this.awaitable = new Awaitable(this);
        actions = new Queue<Action>(coroutines.Select(ConvertCoroutine));
    }

    // Converts a coroutine into an action which consumes the current value,
    // calls the coroutine, and attaches a continuation to it so that the return
    // value is used as the new value.
    private Action ConvertCoroutine(Func<Coordinator<T>, T, Task<T>> coroutine)
    {
        return () =>
        {
            Task<T> task = coroutine(this, ConsumeValue());
            task.ContinueWith(ignored => SupplyValue(task.Result),
                TaskContinuationOptions.ExecuteSynchronously);
        };
    }
}

I’ve broken ConvertCoroutine into a separate method so that we can use it as the projection for the Select call within the constructor. I did initially have it within a lambda expression within the constructor, but it was utterly hideous in terms of readabililty.

One suggestion I’ve received is that I could declare a new delegate type instead of using Func<Coordinator<T>, T, Task<T>> to represent a coroutine. This could either be a non-generic delegate nested in the generic coordinator class, or a generic stand-alone delegate:

public delegate T Coroutine<T>(Coordinator<T> coordinator, T initialValue);

// Or nested…
public sealed class Coordinator<T>
{
    public delegate T Coroutine(Coordinator<T> coordinator, T initialValue);
}

Both of these would work perfectly well. I haven’t made the change at the moment, but it’s certainly worth considering. The debate about whether to use custom delegate types or Func/Action is one for another blog post, I think :)

The one bit of the initialization I haven’t explained yet is the "awaitable" field and the Awaitable type. They’re to do with yielding – so let’s look at them now.

Yielding and transferring data

Next we need to work out how we’re going to transfer data and control between the coroutines. As I’ve mentioned, we’re going to use a method within the coordinator, called from the coroutines, to accomplish this. The coroutines have this sort of code:

private static async Task<string> FirstCoroutine(
    Coordinator<string> coordinator,
    string initialValue)
{
    Console.WriteLine("Starting FirstCoroutine with initial value {0}",
                      initialValue);            
    …
    string received = await coordinator.Yield("x1");
    Console.WriteLine("Returned to FirstCoroutine with value {0}", received);
    …
    return "x3";
}

The method name "Yield" here is a double-edged sword. The word has two meanings – yielding a value to be used elsewhere, and yielding control until we’re called back. Normally it’s not ideal to use a name that can mean subtly different things – but in this case we actually want both of these meanings.

So, what does Yield need to do? Well, the flow control should look something like this:

  • Coroutine calls Yield()
  • Yield() calls SupplyValue() internally to remember the new value to be consumed by the next coroutine
  • Yield() returns an awaitable to the coroutine
  • Due to the await expression, the coroutine calls GetAwaiter() on the awaitable to get an awaiter
  • The coroutine checks IsCompleted on the awaiter, which must return false (to prompt the remaining behaviour)
  • The coroutine calls OnCompleted() passing in the continuation for the rest of the method
  • The coroutine returns to its caller
  • The coordinator proceeds with the next coroutine
  • When we eventually get back to this coroutine, it will call GetResult() to get the "current value" to assign to the "received" variable.

Now you’ll see that Yield() needs to return some kind of awaitable type – in other words, one with a GetAwaiter() method. Previously we put this directly on the Coordinator type, and we could have done that here – but I don’t really want anyone to just "await coordinator" accidentally. You should really need to call Yield() in order to get an awaitable. So we have an Awaitable type, nested in Coordinator.

We then need to decide what the awaiter type is – the result of calling GetAwaiter() on the awaitable. This time I decided to use the Coordinator itself. That means people could accidentally call IsCompleted, OnCompleted() or GetResult(), but I figured that wasn’t too bad. If we were to go to the extreme, we’d create another type just for the Awaiter as well. It would need to have a reference to the coordinator of course, in order to actually do its job. As it is, we can make the Awaitable just return the Coordinator that created it. (Awaitable is nested within Coordinator<T>, which is how it can refer to T without being generic itself.)

public sealed class Awaitable
{
    private readonly Coordinator<T> coordinator;

    internal Awaitable(Coordinator<T> coordinator)
    {
        this.coordinator = coordinator;
    }

    public Coordinator<T> GetAwaiter()
    {
        return coordinator;
    }
}

The only state here is the coordinator, which is why we create an instance of Awaitable on the construction of the Coordinator, and keep it around.

Now Yield() is really simple:

public Awaitable Yield(T value)
{
    SupplyValue(value);
    return awaitable;
}

So to recap, we now just need the awaiter members, SupplyValue() and ConsumeValue(). Let’s look at the awaiter members (in Coordinator) to start with. We already know that IsCompleted will just return false. OnCompleted() just needs to stash the continuation in the queue, and GetResult() just needs to consume the "current" value and return it:

public bool IsCompleted { get { return false; } }

public void OnCompleted(Action continuation)
{
    actions.Enqueue(continuation);
}

public T GetResult()
{
    return ConsumeValue();
}

Simple, huh? Finally, consuming and supplying values:

private T currentValue;
private bool valuePresent;

private void SupplyValue(T value)
{
    if (valuePresent)
    {
        throw new InvalidOperationException
            ("Attempt to supply value when one is already present");
    }
    currentValue = value;
    valuePresent = true;
}

private T ConsumeValue()
{
    if (!valuePresent)
    {
        throw new InvalidOperationException
            ("Attempt to consume value when it isn’t present");
    }
    T oldValue = currentValue;
    valuePresent = false;
    currentValue = default(T);
    return oldValue;
}

These are relatively long methods (compared with the other ones I’ve shown) but pretty simple. Hopefully they don’t need explanation :)

The results

Now that everything’s in place, we can run it. I haven’t posted the full code of the coroutines, but you can see it on Google Code. Hopefully the results speak for themselves though – you can see the relevant values passing from one coroutine to another (and in and out of the Start method).

Starting FirstCoroutine with initial value m1
Yielding ‘x1′ from FirstCoroutine…
    Starting SecondCoroutine with initial value x1
    Starting SecondCoroutine
    Yielding ‘y1′ from SecondCoroutine…
        Starting ThirdCoroutine with initial value y1
        Yielding ‘z1′ from ThirdCoroutine…
Returned to FirstCoroutine with value z1
Yielding ‘x2′ from FirstCoroutine…
    Returned to SecondCoroutine with value x2
    Yielding ‘y2′ from SecondCoroutine…
        Returned to ThirdCoroutine with value y2
        Finished ThirdCoroutine…
Returned to FirstCoroutine with value z2
Finished FirstCoroutine
    Returned to SecondCoroutine with value x3
    Yielding ‘y3′ from SecondCoroutine…
    Returned to SecondCoroutine with value y3
    Finished SecondCoroutine
Final result: y4

Conclusion

I’m not going to claim this is the world’s most useful coroutine model – or indeed useful at all. As ever, I’m more interested in thinking about how data and control flow can be modelled than actual usefulness.

In this case, it was the realization that everything should accept and return a value of the same type which really made it all work. After that, the actual code is pretty straightforward. (At least, I think it is – please let me know if any bits are confusing, and I’ll try to elaborate on them.)

Next time we’ll look at something more like a pipeline model – something remarkably reminiscent of LINQ, but without taking up as much stack space (and with vastly worse readability, of course). Unfortunately the current code reaches the limits of my ability to understand why it works, which means it far exceeds my ability to explain why it works. Hopefully I can simplify it a bit over the next few days.

One thought on “Eduasync part 14: Data passing in coroutines”

  1. The pipeline model is, I think, the model I think would be most useful for co-routines, the round-robin model doesn’t seem to have any benefits over actual threads other than (potentially?) performance.

    For example, imagine a lexer/parser/evaluator pipeline where the evaluator can change the lexer/parser behaviour as it runs. I remember more situations where I would like it, but they are a bit messy to describe in a post :)

Comments are closed.