C#

LINQ: Implementing The TakeLastWhile Operator

LINQ With C# (Portuguese)

Following my last posts (>)(>), in this post I’ll introduce the implementation of the TakeLastWhile operator.

The TakeLastWhile operator returns last contiguous elements from a sequence that satisfy the specified criteria and is implemented as the TakeLastWhile extension methods:

public static IEnumerable<TSource> TakeLastWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
public static IEnumerable<TSource> TakeLastWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate)

The implementation of the first method is very simple. We start with an empty buffer and buffer every item that satisfies the criteria implemented by a predicate. Whenever an item doesn’t satisfy the criteria, the buffer is cleared:

var buffer = new List<TSource>();

foreach (var item in source)
{
    if (predicate(item))
    {
        buffer.Add(item);
    }
    else
    {
        buffer.Clear();
    }
}

After traversing the source sequence, we just yield all the items, if any, in the buffer:

foreach (var item in buffer)
{
    yield return item;
}

The overload that takes in account the index of the item only differs in the call the predicate that implements the criteria:

var buffer = new List<TSource>();
var idx = 0;

foreach (var item in source)
{
    if (predicate(item, idx++))
    {
        buffer.Add(item);
    }
    else
    {
        buffer.Clear();
    }
}

foreach (var item in buffer)
{
    yield return item;
}

You can find the complete implementation of this operator (and more) CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

LINQ: Implementing The TakeLast Operator

LINQ With C# (Portuguese)

Following my last post, in this post I’ll introduce the implementation of the TakeLast operator.

The TakeLast operator returns a specified number of contiguous elements from the end of a sequence and is implemented as the TakeLast extension method:

public static IEnumerable<TSource> TakeLast<TSource>(this IEnumerable<TSource> source, int count)

To implement this operator, first we start by buffering, at most, a count number of items from the source sequence in an array that acts as a circular buffer:

var sourceEnumerator = source.GetEnumerator();
var buffer = new TSource[count];
var numOfItems = 0;
int idx;

for (idx = 0; (idx < count) && sourceEnumerator.MoveNext(); idx++, numOfItems++)
{
    buffer[idx] = sourceEnumerator.Current;
}

If the number of buffered items (numOfItems) is less than the requested number of items (count), we just yield all the buffered items:

if (numOfItems < count)
{
    for (idx = 0; idx < numOfItems; idx++)
    {
        yield return buffer[idx];
    }

    yield break;
}

Next, we iterate over the rest of the items circularly buffering them:

for (idx = 0; sourceEnumerator.MoveNext(); idx = (idx + 1) % count)
{
    buffer[idx] = sourceEnumerator.Current;
}

And finally, we just iterate over the buffered items and yield them:

for (; numOfItems > 0; idx = (idx + 1) % count, numOfItems--)
{
    yield return buffer[idx];
}

There are two optimizations you can make here.

The first is obvious, if the requested number of items is 0 (zero), we just return an empty sequence:

if (count <= 0)
{
    return System.Linq.Enumerable.Empty<TSource>();
}

The second is if the source sequence is known to implement the IList<T> interface. Objects implementing this interface have a Count property and indexed access to its items which allows us to optimize the production of the final sequence.

Producing the final sequence consists of yielding the required number of items from the end of the list (or all of them if the list contains less items than required):

int listCount = list.Count;

for (int idx = listCount - ((count < listCount) ? count : listCount); idx < listCount; idx++)
{
    yield return list[idx];
}

You can find the complete implementation of this operator (and more) CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

Hydrating Objects With Expression Trees – Part III

LINQ With C# (Portuguese)

To finalize this series on object hydration, I’ll show some performance comparisons between the different methods of hydrating objects.

For the purpose of this exercise, I’ll use this class:

class SomeType
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTimeOffset CreationTime { get; set; }
    public Guid UniqueId { get; set; }
}

and this set of data:

var data = (
    from i in Enumerable.Range(1, ObjectCount)
    select new object[] { i, i.ToString(), DateTimeOffset.Now, Guid.NewGuid() }
).ToArray();

The data bellow shows the time (in seconds) for different runs for different values of ObjectCount (in the same machine with approximately the same load) as well as it’s availability for different version of the .NET Framework and the C# programming language:

10000 100000 1000000 Valid for
Setup Hydrate Total Setup Hydrate Total Setup Hydrate Total Framework version C# Version
Activation and Reflection setter 0.060 0.101 0.161 0.055 0.736 0.791 0.054 6.822 6.876 1.0, 1.1, 2.0, 3.5, 4.0 1.0, 2.0, 3.0, 4.0
Activation and Expression Tree setter 0.300 0.003 0.303 0.313 0.049 0.359 0.293 0.578 0.871 4.0 none
Member Initializer 0.035 0.001 0.036 0.039 0.027 0.066 0.041 0.518 0.559 3.5, 4.0 3.0, 4.0

These values will vary with the number of the objects being hydrated and the number of its properties, but the method using the Member Initializer will be the most performant.

Code samples for this series of posts (and the one about object dumping with expression trees) can be found on my MSDN Code Gallery: Dump And Hydrate Objects With Expression Trees

Hydrating Objects With Expression Trees – Part II

LINQ With C# (Portuguese)

In my previous post I showed how to hydrate objects by creating instances and setting properties in those instances.

But, if the intent is to hydrate the objects from data, why not having an expression that does just that? That’s what the member initialization expression is for.

To create such an expression we need the constructor expression and the property binding expressions:

var properties = objectType.GetProperties();
var bindings = new MemberBinding[properties.Length];
var valuesArrayExpression = Expression.Parameter(typeof(object[]), "v");

for (int p = 0; p < properties.Length; p++)
{
    var property = properties[p];

    bindings[p] = Expression.Bind(
        property,
        Expression.Convert(
            Expression.ArrayAccess(
                valuesArrayExpression,
                Expression.Constant(p, typeof(int))
            ),
            property.PropertyType
        )
    );
}

var memberInitExpression = Expression.MemberInit(
    Expression.New(objectType),
    bindings
);

var objectHidrationExpression = Expression.Lambda<Func<object[], object>>(memberInitExpression, valuesArrayExpression);

var compiledObjectHidrationExpression = objectHidrationExpression.Compile();

This might seem more complex than the previous solution, but using it is a lot more simple:

for (int o = 0; o < objects.Length; o++)
{
    newObjects[o] = compiledObjectHidrationExpression(objects[o]);
}

Hydrating Objects With Expression Trees – Part I

LINQ With C# (Portuguese)

After my post about dumping objects using expression trees, I’ve been asked if the same could be done for hydrating objects.

Sure it can, but it might not be that easy.

What we are looking for is a way to set properties on objects of an unknown type. For that, we need to generate methods to set each property of the objects.

Such methods would look like this expression:

Expression<Action<object, object>> expression = (o, v) => ((SomeType)o).Property1 = (PropertyType)v;

Unfortunately, we cannot use the .NET Reflector trick because, if you try to compile this, you’ll get this error:

error CS0832: An expression tree may not contain an assignment operator

Fortunately, that corresponds to a valid .NET expression tree. We just have to build it by hand.

So, for a given type, the set of property setters would be built this way:

var compiledExpressions = (from property in objectType.GetProperties()
                           let objectParameterExpression = Expression.Parameter(typeof(object), "o")
                           let convertedObjectParameteExpressionr = Expression.ConvertChecked(objectParameter, objectType)
                           let valueParameter = Expression.Parameter(propertyType, "v")
                           let convertedValueParameter = Expression.ConvertChecked(valueParameter, property.PropertyType)
                           let propertyExpression = Expression.Property(convertedObjectParameter, property)
                           select
                                Expression.Lambda<Action<object, object>>(
                                    Expression.Assign(
                                        propertyExpression,
                                        convertedValueParameter
                                    ),
                                    objectParameter,
                                    valueParameter
                                ).Compile()).ToArray();

And hydrating objects would be like this:

for (int o = 0; o < objects.Length; o++)
{
    var objectProperties = objects[o];

    var newObject = newObjects[o] = Activator.CreateInstance(objectType);

    for (int p = 0; p < compiledExpressions.Length; p++)
    {
        compiledExpressions[p](newObject, objectProperties[p]);
    }
}

Mastering Expression Trees With .NET Reflector

Following my last post, I received lots of enquiries about how got to master the creation of expression trees.

The answer is: .NET Reflector

On that post I needed to to generate an expression tree for this expression:

Expression<Func<object, object>> expression = o => ((object)((SomeType)o).Property1);

I just compiled that code in Visual Studio 2010, loaded the assembly in .NET Reflector, and disassembled it to C# without optimizations (View –> Options –> Disassembler –> Optimization: None).

The disassembled code looked like this:

Expression<Func<object, object>> expression;
ParameterExpression CS$0$0000;
ParameterExpression[] CS$0$0001;
expression = Expression.Lambda<Func<object, object>>(Expression.Convert(Expression.Property(Expression.Convert(CS$0$0000 = Expression.Parameter(typeof(object), "o"), typeof(SomeType)), (MethodInfo) methodof(SomeType.get_Property1)), typeof(object)), new ParameterExpression[] { CS$0$0000 });

After giving valid C# names to the variables and tidy up the code a bit, I came up with this:

ParameterExpression parameter = Expression.Parameter(typeof(object), "o");
Expression<Func<object, object>> expression =
    Expression.Lambda<Func<object, object>>(
        Expression.Convert(
            Expression.Property(
                Expression.Convert(
                    parameter,
                    typeof(SomeType)
                ),
                "Property1"
            ),
            typeof(object)
        ),
        parameter
    );

Easy! Isn’t it?

Dumping Objects Using Expression Trees

LINQ With C# (Portuguese)

No. I’m not proposing to get rid of objects.

A colleague of mine was asked if I knew a way to dump a list of objects of unknown type into a DataTable with better performance than the way he was using.

The objects being dumped usually have over a dozen of properties, but, for the sake of this post, let’s assume they look like this:

class SomeClass
{
    public int Property1 { get; set; }
    public long Property2 { get; set; }
    public string Property3 { get; set; }
    public object Property4 { get; set; }
    public DateTimeOffset Property5 { get; set; }
}

The code he was using was something like this:

var properties = objectType.GetProperties();

foreach (object obj in objects)
{
    foreach (var property in properties)
    {
        property.GetValue(obj, null);
    }
}

For a list of one million objects, this is takes a little over 6000 milliseconds on my machine.

I immediately thought: Expression Trees!

If the type of the objects was know at compile time, it would be something like this:

Expression<Func<SomeClass, int>> expression = o => o.Property1;
var compiled = expression.Compile();
var propertyValue = compiled.Invoke(obj);

But, at compile time, the type of the object and, consequently, the type of its properties, is unknown. So, we’ll need, for each property, an expression tree like this:

Expression<Func<object, object>> expression = o => ((SomeClass)o).Property1;

The previous expression gets the value of a property of the conversion of the parameter of type object to the type of the object. The result must also be converted to type object because the type of the result must match the type of the return value of the expression.

For the same type of objects, the collection of property accessors would be built this way:

var compiledExpressions = (from property in properties
                           let objectParameter = Expression.Parameter(typeof(object), "o")
                           select
                             Expression.Lambda<Func<object, object>>(
                                 Expression.Convert(
                                     Expression.Property(
                                         Expression.Convert(
                                             objectParameter,
                                             objectType
                                         ),
                                         property
                                     ),
                                     typeof(object)
                                 ),
                                 objectParameter
                             ).Compile()).ToArray();

Looks bit overcomplicated, but reading all properties of all objects for the same object set with this code:

foreach (object obj in objects)
{
    foreach (var compiledExpression in compiledExpressions)
    {
        compiledExpression (obj);
    }
}

takes a little over 150 milliseconds on my machine.

That’s right. 2.5% of the previous time.

C#: More On Array Variance

In a previous post, I went through how arrays have are covariant in relation to the type of its elements, but not safely covariant.

In the following example, the second assignment is invalid at run time because, although the type of the objectArray variable is array of object, the real type of the array is array of string and an object cannot be assigned to a string.

object[] objectArray = new string[] { "string 1", "string 2" };
objectArray[0] = "string 3";
objectArray[1] = new object();

On the other hand, because arrays are not contravariant in relation to the type of its elements, in the following code, the second line will fail at run time because string[] ≤ object[].

object[] objectArray = new object[] { "string 1", "string 2" };
string[] stringArray = (string[])objectArray;

The fact that all elements in the object array are strings doesn’t make it convertible to an array of string. To convert this object array of strings into a string array, you’ll need to create a new string array and copy each converted element.

The conversion can be as easily as this:

string[] stringArray = objectArray.Cast<string>().ToArray();

The above code is just shorthand to traversing the whole array while converting its elements to string and creating an string array with all elements.

Arrays are a good storage structure because they are the only structure provided by the runtime to store groups of items of the same type. However, because of limitations like the above, its use in APIs should be very carefully considered.

If all you need is to traverse all elements of some collection, you should use an IEnumerable<T> (IEnumerable<out T> in .NET 4.0). This way, the cost of using Enumerable.Cast<T>() is minimal.

TechDays 2010: What’s New On C# 4.0

I would like to thank those that attended my session at TechDays 2010 and I hope that I was able to pass the message of what’s new on C#.

For those that didn’t attend (or did and want to review it), the presentation can be downloaded from here.

Code samples can be downlaoded from here.

Here’s a list of resources mentioned on the session:

C# 4.0: COM Interop Improvements

Dynamic resolution as well as named and optional arguments greatly improve the experience of interoperating with COM APIs such as Office Automation Primary Interop Assemblies (PIAs). But, in order to alleviate even more COM Interop development, a few COM-specific features were also added to C# 4.0.

Ommiting ref

Because of a different programming model, many COM APIs contain a lot of reference parameters. These parameters are typically not meant to mutate a passed-in argument, but are simply another way of passing value parameters.

Specifically for COM methods, the compiler allows to declare the method call passing the arguments by value and will automatically generate the necessary temporary variables to hold the values in order to pass them by reference and will discard their values after the call returns. From the point of view of the programmer, the arguments are being passed by value.

This method call:

object fileName = "Test.docx";
object missing = Missing.Value;

document.SaveAs(ref fileName,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing);

can now be written like this:

document.SaveAs("Test.docx",
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value);

And because all parameters that are receiving the Missing.Value value have that value as its default value, the declaration of the method call can even be reduced to this:

document.SaveAs("Test.docx");

Dynamic Import

Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object form the context of the call, but has to explicitly perform a cast on the returned values to make use of that knowledge. These casts are so common that they constitute a major nuisance.

To make the developer’s life easier, it is now possible to import the COM APIs in such a way that variants are instead represented using the type dynamic which means that COM signatures have now occurrences of dynamic instead of object.

This means that members of a returned object can now be easily accessed or assigned into a strongly typed variable without having to cast.

Instead of this code:

((Excel.Range)(excel.Cells[1, 1])).Value2 = "Hello World!";

this code can now be used:

excel.Cells[1, 1] = "Hello World!";

And instead of this:

Excel.Range range = (Excel.Range)(excel.Cells[1, 1]);

this can be used:

Excel.Range range = excel.Cells[1, 1];

Indexed And Default Properties

A few COM interface features are still not available in C#. On the top of the list are indexed properties and default properties. As mentioned above, these will be possible if the COM interface is accessed dynamically, but will not be recognized by statically typed C# code.

No PIAs – Type Equivalence And Type Embedding

For assemblies indentified with PrimaryInteropAssemblyAttribute, the compiler will create equivalent types (interfaces, structs, enumerations and delegates) and embed them in the generated assembly.

To reduce the final size of the generated assembly, only the used types and their used members will be generated and embedded.

Although this makes development and deployment of applications using the COM components easier because there’s no need to deploy the PIAs, COM component developers are still required to build the PIAs.