.NET4.0

LINQ: Implementing The TakeLast Operator

LINQ With C# (Portuguese)

Following my last post, in this post I’ll introduce the implementation of the TakeLast operator.

The TakeLast operator returns a specified number of contiguous elements from the end of a sequence and is implemented as the TakeLast extension method:

public static IEnumerable<TSource> TakeLast<TSource>(this IEnumerable<TSource> source, int count)

To implement this operator, first we start by buffering, at most, a count number of items from the source sequence in an array that acts as a circular buffer:

var sourceEnumerator = source.GetEnumerator();
var buffer = new TSource[count];
var numOfItems = 0;
int idx;

for (idx = 0; (idx < count) && sourceEnumerator.MoveNext(); idx++, numOfItems++)
{
    buffer[idx] = sourceEnumerator.Current;
}

If the number of buffered items (numOfItems) is less than the requested number of items (count), we just yield all the buffered items:

if (numOfItems < count)
{
    for (idx = 0; idx < numOfItems; idx++)
    {
        yield return buffer[idx];
    }

    yield break;
}

Next, we iterate over the rest of the items circularly buffering them:

for (idx = 0; sourceEnumerator.MoveNext(); idx = (idx + 1) % count)
{
    buffer[idx] = sourceEnumerator.Current;
}

And finally, we just iterate over the buffered items and yield them:

for (; numOfItems > 0; idx = (idx + 1) % count, numOfItems--)
{
    yield return buffer[idx];
}

There are two optimizations you can make here.

The first is obvious, if the requested number of items is 0 (zero), we just return an empty sequence:

if (count <= 0)
{
    return System.Linq.Enumerable.Empty<TSource>();
}

The second is if the source sequence is known to implement the IList<T> interface. Objects implementing this interface have a Count property and indexed access to its items which allows us to optimize the production of the final sequence.

Producing the final sequence consists of yielding the required number of items from the end of the list (or all of them if the list contains less items than required):

int listCount = list.Count;

for (int idx = listCount - ((count < listCount) ? count : listCount); idx < listCount; idx++)
{
    yield return list[idx];
}

You can find the complete implementation of this operator (and more) CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

LINQ: Introducing The Take Last Operators

LINQ With C# (Portuguese)

Some time ago I needed to retrieve the last items of a sequence that satisfied some criteria and, looking at the operators available in the Enumerable class, I noticed that there wasn’t such operator.

The only way to achieve this was to reverse the sequence, take the items that satisfied the criteria and reverse the resulting sequence. Something like this:

sequence.Reverse().TakeWhile(criteria).Reverse();

Looks quite simple, right? First we call the Reverse method to produce a new sequence with the same items as the original sequence but in the reverse order, then we call the TakeWhile method to take the first items that satisfy the criteria and then call the Reverse method again to restore the original order of the items.

The problem with this approach is that the Reverse method buffers the entire sequence before iterating through its items in the reverse order - and the above code uses it twice. This means iterating over all items in the original sequence and buffer them all, iterating over first items of the resulting sequence that satisfy the criteria and buffer them all and, finally, iterate over that result to produce the final sequence.

If you’re counting, you’ve come to the conclusion that all items in the original sequence will be iterated over once and the ones in the resulting sequence will be iterated again three times. If the original sequence is large, this can take lots of memory and time.

There’s another issue if you’re using the variant the uses the index of the item in the original sequence in the evaluation of the selection criteria (>). When we reverse the order of the items, the indexes will be reversed and the predicate must take that in account, which might not be possible if you don’t know the number of items in the original sequence.

There must be a better way, and that’s why I implemented the Take Last Operators:

Name Description Example
TakeLast<TSource>(IEnumerable<TSource>)

Returns a specified number of contiguous elements from the end of a sequence.

int[] grades = { 59, 82, 70, 56, 92, 98, 85 };

var topThreeGrades = grades
                     .OrderBy(grade => grade)
                     .TakeLast(3);

Console.WriteLine("The top three grades are:");
foreach (int grade in topThreeGrades)
{
    Console.WriteLine(grade);
}
/*
This code produces the following output:

The top three grades are:
98
92
85
*/
TakeLastWhile<TSource>(IEnumerable<TSource>, Func<TSource, Boolean>)

Returns the elements from the end of a sequence as long as the specified condition is true.

string[] fruits =
    {
        "apple",
        "passionfruit",
        "banana",
        "mango",
        "orange",
        "blueberry",
        "grape",
        "strawberry"
    };

var query = fruits
            .TakeLastWhile(fruit => string.Compare("orange", fruit, true) != 0);

foreach (string fruit in query)
{
    Console.WriteLine(fruit);
}

/*
This code produces the following output:
blueberry
grape
strawberry
*/
TakeLastWhile<TSource>(IEnumerable<TSource>, Func<TSource, Int32, Boolean>)

Returns the elements from the end of a sequence as long as the specified condition is true.

string[] fruits =
    {
        "apple",
        "passionfruit",
        "banana",
        "mango",
        "orange",
        "blueberry",
        "grape",
        "strawberry"
    };

var query = fruits
            .TakeLastWhile((fruit, index) => fruit.Length >= index);

foreach (string fruit in query)
{
    Console.WriteLine(fruit);
}

/*
This code produces the following output:

strawberry
*/

You can find these (and more) operators in my CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

PauloMorgado.Linq Project On CodePlex

LINQ With C# (Portuguese)

I’ve created a project on CodePlex to share my LINQ utilities and operators: PauloMorgado.Linq

In this project you can find the code from my previous posts about the Distinct operator:

In future posts, I’ll introduce other operators.

Hydrating Objects With Expression Trees – Part III

LINQ With C# (Portuguese)

To finalize this series on object hydration, I’ll show some performance comparisons between the different methods of hydrating objects.

For the purpose of this exercise, I’ll use this class:

class SomeType
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTimeOffset CreationTime { get; set; }
    public Guid UniqueId { get; set; }
}

and this set of data:

var data = (
    from i in Enumerable.Range(1, ObjectCount)
    select new object[] { i, i.ToString(), DateTimeOffset.Now, Guid.NewGuid() }
).ToArray();

The data bellow shows the time (in seconds) for different runs for different values of ObjectCount (in the same machine with approximately the same load) as well as it’s availability for different version of the .NET Framework and the C# programming language:

10000 100000 1000000 Valid for
Setup Hydrate Total Setup Hydrate Total Setup Hydrate Total Framework version C# Version
Activation and Reflection setter 0.060 0.101 0.161 0.055 0.736 0.791 0.054 6.822 6.876 1.0, 1.1, 2.0, 3.5, 4.0 1.0, 2.0, 3.0, 4.0
Activation and Expression Tree setter 0.300 0.003 0.303 0.313 0.049 0.359 0.293 0.578 0.871 4.0 none
Member Initializer 0.035 0.001 0.036 0.039 0.027 0.066 0.041 0.518 0.559 3.5, 4.0 3.0, 4.0

These values will vary with the number of the objects being hydrated and the number of its properties, but the method using the Member Initializer will be the most performant.

Code samples for this series of posts (and the one about object dumping with expression trees) can be found on my MSDN Code Gallery: Dump And Hydrate Objects With Expression Trees

Hydrating Objects With Expression Trees – Part II

LINQ With C# (Portuguese)

In my previous post I showed how to hydrate objects by creating instances and setting properties in those instances.

But, if the intent is to hydrate the objects from data, why not having an expression that does just that? That’s what the member initialization expression is for.

To create such an expression we need the constructor expression and the property binding expressions:

var properties = objectType.GetProperties();
var bindings = new MemberBinding[properties.Length];
var valuesArrayExpression = Expression.Parameter(typeof(object[]), "v");

for (int p = 0; p < properties.Length; p++)
{
    var property = properties[p];

    bindings[p] = Expression.Bind(
        property,
        Expression.Convert(
            Expression.ArrayAccess(
                valuesArrayExpression,
                Expression.Constant(p, typeof(int))
            ),
            property.PropertyType
        )
    );
}

var memberInitExpression = Expression.MemberInit(
    Expression.New(objectType),
    bindings
);

var objectHidrationExpression = Expression.Lambda<Func<object[], object>>(memberInitExpression, valuesArrayExpression);

var compiledObjectHidrationExpression = objectHidrationExpression.Compile();

This might seem more complex than the previous solution, but using it is a lot more simple:

for (int o = 0; o < objects.Length; o++)
{
    newObjects[o] = compiledObjectHidrationExpression(objects[o]);
}

Hydrating Objects With Expression Trees – Part I

LINQ With C# (Portuguese)

After my post about dumping objects using expression trees, I’ve been asked if the same could be done for hydrating objects.

Sure it can, but it might not be that easy.

What we are looking for is a way to set properties on objects of an unknown type. For that, we need to generate methods to set each property of the objects.

Such methods would look like this expression:

Expression<Action<object, object>> expression = (o, v) => ((SomeType)o).Property1 = (PropertyType)v;

Unfortunately, we cannot use the .NET Reflector trick because, if you try to compile this, you’ll get this error:

error CS0832: An expression tree may not contain an assignment operator

Fortunately, that corresponds to a valid .NET expression tree. We just have to build it by hand.

So, for a given type, the set of property setters would be built this way:

var compiledExpressions = (from property in objectType.GetProperties()
                           let objectParameterExpression = Expression.Parameter(typeof(object), "o")
                           let convertedObjectParameteExpressionr = Expression.ConvertChecked(objectParameter, objectType)
                           let valueParameter = Expression.Parameter(propertyType, "v")
                           let convertedValueParameter = Expression.ConvertChecked(valueParameter, property.PropertyType)
                           let propertyExpression = Expression.Property(convertedObjectParameter, property)
                           select
                                Expression.Lambda<Action<object, object>>(
                                    Expression.Assign(
                                        propertyExpression,
                                        convertedValueParameter
                                    ),
                                    objectParameter,
                                    valueParameter
                                ).Compile()).ToArray();

And hydrating objects would be like this:

for (int o = 0; o < objects.Length; o++)
{
    var objectProperties = objects[o];

    var newObject = newObjects[o] = Activator.CreateInstance(objectType);

    for (int p = 0; p < compiledExpressions.Length; p++)
    {
        compiledExpressions[p](newObject, objectProperties[p]);
    }
}

Mastering Expression Trees With .NET Reflector

Following my last post, I received lots of enquiries about how got to master the creation of expression trees.

The answer is: .NET Reflector

On that post I needed to to generate an expression tree for this expression:

Expression<Func<object, object>> expression = o => ((object)((SomeType)o).Property1);

I just compiled that code in Visual Studio 2010, loaded the assembly in .NET Reflector, and disassembled it to C# without optimizations (View –> Options –> Disassembler –> Optimization: None).

The disassembled code looked like this:

Expression<Func<object, object>> expression;
ParameterExpression CS$0$0000;
ParameterExpression[] CS$0$0001;
expression = Expression.Lambda<Func<object, object>>(Expression.Convert(Expression.Property(Expression.Convert(CS$0$0000 = Expression.Parameter(typeof(object), "o"), typeof(SomeType)), (MethodInfo) methodof(SomeType.get_Property1)), typeof(object)), new ParameterExpression[] { CS$0$0000 });

After giving valid C# names to the variables and tidy up the code a bit, I came up with this:

ParameterExpression parameter = Expression.Parameter(typeof(object), "o");
Expression<Func<object, object>> expression =
    Expression.Lambda<Func<object, object>>(
        Expression.Convert(
            Expression.Property(
                Expression.Convert(
                    parameter,
                    typeof(SomeType)
                ),
                "Property1"
            ),
            typeof(object)
        ),
        parameter
    );

Easy! Isn’t it?

Dumping Objects Using Expression Trees

LINQ With C# (Portuguese)

No. I’m not proposing to get rid of objects.

A colleague of mine was asked if I knew a way to dump a list of objects of unknown type into a DataTable with better performance than the way he was using.

The objects being dumped usually have over a dozen of properties, but, for the sake of this post, let’s assume they look like this:

class SomeClass
{
    public int Property1 { get; set; }
    public long Property2 { get; set; }
    public string Property3 { get; set; }
    public object Property4 { get; set; }
    public DateTimeOffset Property5 { get; set; }
}

The code he was using was something like this:

var properties = objectType.GetProperties();

foreach (object obj in objects)
{
    foreach (var property in properties)
    {
        property.GetValue(obj, null);
    }
}

For a list of one million objects, this is takes a little over 6000 milliseconds on my machine.

I immediately thought: Expression Trees!

If the type of the objects was know at compile time, it would be something like this:

Expression<Func<SomeClass, int>> expression = o => o.Property1;
var compiled = expression.Compile();
var propertyValue = compiled.Invoke(obj);

But, at compile time, the type of the object and, consequently, the type of its properties, is unknown. So, we’ll need, for each property, an expression tree like this:

Expression<Func<object, object>> expression = o => ((SomeClass)o).Property1;

The previous expression gets the value of a property of the conversion of the parameter of type object to the type of the object. The result must also be converted to type object because the type of the result must match the type of the return value of the expression.

For the same type of objects, the collection of property accessors would be built this way:

var compiledExpressions = (from property in properties
                           let objectParameter = Expression.Parameter(typeof(object), "o")
                           select
                             Expression.Lambda<Func<object, object>>(
                                 Expression.Convert(
                                     Expression.Property(
                                         Expression.Convert(
                                             objectParameter,
                                             objectType
                                         ),
                                         property
                                     ),
                                     typeof(object)
                                 ),
                                 objectParameter
                             ).Compile()).ToArray();

Looks bit overcomplicated, but reading all properties of all objects for the same object set with this code:

foreach (object obj in objects)
{
    foreach (var compiledExpression in compiledExpressions)
    {
        compiledExpression (obj);
    }
}

takes a little over 150 milliseconds on my machine.

That’s right. 2.5% of the previous time.

TechDays 2010: What’s New On C# 4.0

I would like to thank those that attended my session at TechDays 2010 and I hope that I was able to pass the message of what’s new on C#.

For those that didn’t attend (or did and want to review it), the presentation can be downloaded from here.

Code samples can be downlaoded from here.

Here’s a list of resources mentioned on the session:

C# 4.0: COM Interop Improvements

Dynamic resolution as well as named and optional arguments greatly improve the experience of interoperating with COM APIs such as Office Automation Primary Interop Assemblies (PIAs). But, in order to alleviate even more COM Interop development, a few COM-specific features were also added to C# 4.0.

Ommiting ref

Because of a different programming model, many COM APIs contain a lot of reference parameters. These parameters are typically not meant to mutate a passed-in argument, but are simply another way of passing value parameters.

Specifically for COM methods, the compiler allows to declare the method call passing the arguments by value and will automatically generate the necessary temporary variables to hold the values in order to pass them by reference and will discard their values after the call returns. From the point of view of the programmer, the arguments are being passed by value.

This method call:

object fileName = "Test.docx";
object missing = Missing.Value;

document.SaveAs(ref fileName,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing);

can now be written like this:

document.SaveAs("Test.docx",
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value,
    Missing.Value, Missing.Value, Missing.Value);

And because all parameters that are receiving the Missing.Value value have that value as its default value, the declaration of the method call can even be reduced to this:

document.SaveAs("Test.docx");

Dynamic Import

Many COM methods accept and return variant types, which are represented in the PIAs as object. In the vast majority of cases, a programmer calling these methods already knows the static type of a returned object form the context of the call, but has to explicitly perform a cast on the returned values to make use of that knowledge. These casts are so common that they constitute a major nuisance.

To make the developer’s life easier, it is now possible to import the COM APIs in such a way that variants are instead represented using the type dynamic which means that COM signatures have now occurrences of dynamic instead of object.

This means that members of a returned object can now be easily accessed or assigned into a strongly typed variable without having to cast.

Instead of this code:

((Excel.Range)(excel.Cells[1, 1])).Value2 = "Hello World!";

this code can now be used:

excel.Cells[1, 1] = "Hello World!";

And instead of this:

Excel.Range range = (Excel.Range)(excel.Cells[1, 1]);

this can be used:

Excel.Range range = excel.Cells[1, 1];

Indexed And Default Properties

A few COM interface features are still not available in C#. On the top of the list are indexed properties and default properties. As mentioned above, these will be possible if the COM interface is accessed dynamically, but will not be recognized by statically typed C# code.

No PIAs – Type Equivalence And Type Embedding

For assemblies indentified with PrimaryInteropAssemblyAttribute, the compiler will create equivalent types (interfaces, structs, enumerations and delegates) and embed them in the generated assembly.

To reduce the final size of the generated assembly, only the used types and their used members will be generated and embedded.

Although this makes development and deployment of applications using the COM components easier because there’s no need to deploy the PIAs, COM component developers are still required to build the PIAs.