LINQ

Hash Function Added To The PredicateEqualityComparer

Sometime ago I wrote a predicate equality comparer to be used with LINQ’s Distinct operator.

The Distinct operator uses an instance of an internal Set class to maintain the collection of distinct elements in the source collection which in turn checks the hash code of each element (by calling the GetHashCode method of the equality comparer) and only if there’s already an element with the same hash code in the collection calls the Equals method of the comparer to disambiguate.

At the time I provided only the possibility to specify the comparison predicate, but, in some cases, comparing a hash code instead of calling the provided comparer predicate can be a significant performance improvement, I’ve added the possibility to had a hash function to the predicate equality comparer.

You can get the updated code from the PauloMorgado.Linq project on CodePlex,

Creating Property Set Expression Trees In A Developer Friendly Way

LINQ With C# (Portuguese)In a previous post I showed how to create expression trees to set properties on an object.

The way I did it was not very developer friendly. It involved explicitly creating the necessary expressions because the compiler won’t generate expression trees with property or field set expressions.

Recently someone contacted me the help develop some kind of command pattern framework that used developer friendly lambdas to generate property set expression trees.

Simply putting, given this entity class:

public class Person
{
    public string Name { get; set; }
}

The person in question wanted to write code like this:

var et = Set((Person p) => p.Name = "me");

Where et is the expression tree that represents the property assignment.

So, if we can’t do this, let’s try the next best thing that is splitting retrieving the property information from the retrieving the value to assign o the property:

var et = Set((Person p) => p.Name, () => "me");

And this is something that the compiler can handle.

The implementation of Set receives an expression to retrieve the property information from and another expression the retrieve the value to assign to the property:

public static Expression<Action<TEntity>> Set<TEntity, TValue>(
    Expression<Func<TEntity, TValue>> propertyGetExpression,
    Expression<Func<TValue>> valueExpression)

The implementation of this method gets the property information form the body of the property get expression (propertyGetExpression) and the value expression (valueExpression) to build an assign expression and builds a lambda expression using the same parameter of the property get expression as its parameter:

public static Expression<Action<TEntity>> Set<TEntity, TValue>(
    Expression<Func<TEntity, TValue>> propertyGetExpression,
    Expression<Func<TValue>> valueExpression)
{
    var entityParameterExpression = (ParameterExpression)(((MemberExpression)(propertyGetExpression.Body)).Expression);

    return Expression.Lambda<Action<TEntity>>(
        Expression.Assign(propertyGetExpression.Body, valueExpression.Body),
        entityParameterExpression);
}

And now we can use the expression to translate to another context or just compile and use it:

var et = Set((Person p) => p.Name, () => name);
Console.WriteLine(person.Name); // Prints: p => (p.Name = “me”)

var d = et.Compile();

d(person);
Console.WriteLine(person.Name); // Prints: me

It can even support closures:

var et = Set((Person p) => p.Name, () => name);
Console.WriteLine(person.Name); // Prints: p => (p.Name = value(<>c__DisplayClass0).name)

var d = et.Compile();

name = "me";
d(person);
Console.WriteLine(person.Name); // Prints: me

name = "you";
d(person);
Console.WriteLine(person.Name); // Prints: you

Not so useful in the intended scenario (but still possible) is building an expression tree that receives the value to assign to the property as a parameter:

public static Expression<Action<TEntity, TValue>> Set<TEntity, TValue>(Expression<Func<TEntity, TValue>> propertyGetExpression)
{
    var entityParameterExpression = (ParameterExpression)(((MemberExpression)(propertyGetExpression.Body)).Expression);
    var valueParameterExpression = Expression.Parameter(typeof(TValue));

    return Expression.Lambda<Action<TEntity, TValue>>(
        Expression.Assign(propertyGetExpression.Body, valueParameterExpression),
        entityParameterExpression,
        valueParameterExpression);
}

This new expression can be used like this:

var et = Set((Person p) => p.Name);
Console.WriteLine(person.Name); // Prints: (p, Param_0) => (p.Name = Param_0)

var d = et.Compile();

d(person, "me");
Console.WriteLine(person.Name); // Prints: me

d(person, "you");
Console.WriteLine(person.Name); // Prints: you

The only caveat is that we need to be able to write code to read the property in order to write to it.

LINQ: Implementing The SkipLastWhile Operator

LINQ Com C#

Following my last posts (>)(>), in this post I’ll introduce the implementation of the SkipLastWhile operator.

The SkipLastWhile returns all but the last contiguous elements from a a sequence that satisfy the specified criteria and is implemented as the SkipLastWhile extension methods:

public static IEnumerable<TSource> SkipLastWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
public static IEnumerable<TSource> SkipLastWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate)

The implementation of these methods is very simple. We start with an empty buffer and buffer every item that satisfies the criteria implemented by a predicate. Whenever an item doesn’t satisfy the criteria, all buffered items are yield, the buffer is cleared and the the item that doesn’t satisfy the criteria is yield:

var buffer = new List<TSource>();

foreach (var item in source)
{
    if (predicate(item))
    {
        buffer.Add(item);
    }
    else
    {
        if (buffer.Count > 0)
        {
            foreach (var bufferedItem in buffer)
            {
                yield return bufferedItem;
            }

            buffer.Clear();
        }

        yield return item;
    }
}

The overload that takes in account the index of the item only differs in the call the predicate that implements the criteria:

var buffer = new List<TSource>();
var idx = 0;

foreach (var item in source)
{
    if (predicate(item, idx++))
    {
        buffer.Add(item);
    }
    else
    {
        if (buffer.Count > 0)
        {
            foreach (var bufferedItem in buffer)
            {
                yield return bufferedItem;
            }

            buffer.Clear();
        }

        yield return item;
    }
}

You can find the complete implementation of this operator (and more) CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

LINQ: Implementing The SkipLast Operator

LINQ With C# (Portuguese)

Following my last post, in this post I’ll introduce the implementation of the SkipLast operator.

The SkipLast operator returns all but a specified number of contiguous elements from the end of a sequence and is implemented as the SkipLast extension method:

public static IEnumerable<TSource> SkipLast<TSource>(this IEnumerable<TSource> source, int count)

To implement this operator, first we start by buffering, at most, a count number of items from the source sequence in an array that acts as a circular buffer:

var sourceEnumerator = source.GetEnumerator();
var buffer = new TSource[count];
int idx;

for (idx = 0; (idx < count) && sourceEnumerator.MoveNext(); idx++)
{
    buffer[idx] = sourceEnumerator.Current;
}

Next, we iterate over the rest of the items of the source sequence circularly buffering them and yielding the previously buffered item at the same postition:

idx = 0;

while (sourceEnumerator.MoveNext())
{
    var item = buffer[idx];

    buffer[idx] = sourceEnumerator.Current;

    idx = (idx + 1) % count;

    yield return item;
}

If the number of items to skip is greater or equal to the number of items in the source sequence, sourceEnumerator.MoveNext() will return false on the first iteration of the while loop and an empty sequence will be produced.

As with the TakeLast operator, a few optimizations can be made here.

The first is obvious, if the requested number of items is 0 (zero) or lower, we just return an equivalent sequence:

if (count <= 0)
{
    return source.Select(i => i);
}

The second is if the source sequence is known to implement the IList<T> interface. Objects implementing this interface have a Count property and indexed access to its items which allows us to optimize the production of the final sequence.

If the number of items to skip is equal to or greater than the number of items in the source sequence, we just return an empty sequence:

var list = source as IList<TSource>;

if (list != null)
{
    if (count >= list.Count)
    {
        return System.Linq.Enumerable.Empty<TSource>();
    }

    // ...
}

If the number of items in the source sequence is greater than the number of items to skip, producing the final sequence consists of yielding all the items in the source sequence except the last count items:

int returnCount = list.Count - count;

for (int idx = 0; idx < returnCount; idx++)
{
    yield return list[idx];
}

You can find the complete implementation of this operator (and more) CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

LINQ: Introducing The Skip Last Operators

LINQ With C# (Portuguese)

After having introduced the TakeLast operators (>)(>)(>), it makes sense to introduce their duals: the SkipLast operators.

Name Description Example
SkipLast<TSource>(IEnumerable<TSource>)

Returns all but a specified number of contiguous elements from the end of a sequence.

int[] grades = { 59, 82, 70, 56, 92, 98, 85 };

var lowerGrades = grades
                  .OrderBy(g => g)
                  .SkipLast(3);

Console.WriteLine("All grades except the top three are:");
foreach (int grade in lowerGrades)
{
    Console.WriteLine(grade);
}

/*
This code produces the following output:

All grades except the top three are:
56
59
70
82
*/
SkipLastWhile<TSource>(IEnumerable<TSource>, Func<TSource, Boolean>)

Returns all the elements from sequence skipping those at the end as long as the specified condition is true.

int[] grades = { 59, 82, 70, 56, 92, 98, 85 };

var lowerGrades = grades
                 .OrderBy(grade => grade)
                 .SkipLastWhile(grade => grade >= 80);

Console.WriteLine("All grades below 80:");
foreach (int grade in lowerGrades)
{
    Console.WriteLine(grade);
}

/*
This code produces the following output:

All grades below 80:
56
59
70
*/
SkipLastWhile<TSource>(IEnumerable<TSource>, Func<TSource, Int32, Boolean>)

Returns all the elements from sequence skipping those at the end as long as the specified condition is true.

int[] amounts =
    {
        5000,
        2500,
        5500,
        8000,
        6500,
        4000,
        1500,
        9000
    };

var query = amounts
            .SkipLastWhile((amount, index) => amount > index * 1000);

foreach (int amount in query)
{
    Console.WriteLine(amount);
}

/*
 This code produces the following output:

9000
*/

You can find these (and more) operators in my CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

LINQ: Implementing The TakeLastWhile Operator

LINQ With C# (Portuguese)

Following my last posts (>)(>), in this post I’ll introduce the implementation of the TakeLastWhile operator.

The TakeLastWhile operator returns last contiguous elements from a sequence that satisfy the specified criteria and is implemented as the TakeLastWhile extension methods:

public static IEnumerable<TSource> TakeLastWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
public static IEnumerable<TSource> TakeLastWhile<TSource>(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate)

The implementation of the first method is very simple. We start with an empty buffer and buffer every item that satisfies the criteria implemented by a predicate. Whenever an item doesn’t satisfy the criteria, the buffer is cleared:

var buffer = new List<TSource>();

foreach (var item in source)
{
    if (predicate(item))
    {
        buffer.Add(item);
    }
    else
    {
        buffer.Clear();
    }
}

After traversing the source sequence, we just yield all the items, if any, in the buffer:

foreach (var item in buffer)
{
    yield return item;
}

The overload that takes in account the index of the item only differs in the call the predicate that implements the criteria:

var buffer = new List<TSource>();
var idx = 0;

foreach (var item in source)
{
    if (predicate(item, idx++))
    {
        buffer.Add(item);
    }
    else
    {
        buffer.Clear();
    }
}

foreach (var item in buffer)
{
    yield return item;
}

You can find the complete implementation of this operator (and more) CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

LINQ: Implementing The TakeLast Operator

LINQ With C# (Portuguese)

Following my last post, in this post I’ll introduce the implementation of the TakeLast operator.

The TakeLast operator returns a specified number of contiguous elements from the end of a sequence and is implemented as the TakeLast extension method:

public static IEnumerable<TSource> TakeLast<TSource>(this IEnumerable<TSource> source, int count)

To implement this operator, first we start by buffering, at most, a count number of items from the source sequence in an array that acts as a circular buffer:

var sourceEnumerator = source.GetEnumerator();
var buffer = new TSource[count];
var numOfItems = 0;
int idx;

for (idx = 0; (idx < count) && sourceEnumerator.MoveNext(); idx++, numOfItems++)
{
    buffer[idx] = sourceEnumerator.Current;
}

If the number of buffered items (numOfItems) is less than the requested number of items (count), we just yield all the buffered items:

if (numOfItems < count)
{
    for (idx = 0; idx < numOfItems; idx++)
    {
        yield return buffer[idx];
    }

    yield break;
}

Next, we iterate over the rest of the items circularly buffering them:

for (idx = 0; sourceEnumerator.MoveNext(); idx = (idx + 1) % count)
{
    buffer[idx] = sourceEnumerator.Current;
}

And finally, we just iterate over the buffered items and yield them:

for (; numOfItems > 0; idx = (idx + 1) % count, numOfItems--)
{
    yield return buffer[idx];
}

There are two optimizations you can make here.

The first is obvious, if the requested number of items is 0 (zero), we just return an empty sequence:

if (count <= 0)
{
    return System.Linq.Enumerable.Empty<TSource>();
}

The second is if the source sequence is known to implement the IList<T> interface. Objects implementing this interface have a Count property and indexed access to its items which allows us to optimize the production of the final sequence.

Producing the final sequence consists of yielding the required number of items from the end of the list (or all of them if the list contains less items than required):

int listCount = list.Count;

for (int idx = listCount - ((count < listCount) ? count : listCount); idx < listCount; idx++)
{
    yield return list[idx];
}

You can find the complete implementation of this operator (and more) CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

LINQ: Introducing The Take Last Operators

LINQ With C# (Portuguese)

Some time ago I needed to retrieve the last items of a sequence that satisfied some criteria and, looking at the operators available in the Enumerable class, I noticed that there wasn’t such operator.

The only way to achieve this was to reverse the sequence, take the items that satisfied the criteria and reverse the resulting sequence. Something like this:

sequence.Reverse().TakeWhile(criteria).Reverse();

Looks quite simple, right? First we call the Reverse method to produce a new sequence with the same items as the original sequence but in the reverse order, then we call the TakeWhile method to take the first items that satisfy the criteria and then call the Reverse method again to restore the original order of the items.

The problem with this approach is that the Reverse method buffers the entire sequence before iterating through its items in the reverse order - and the above code uses it twice. This means iterating over all items in the original sequence and buffer them all, iterating over first items of the resulting sequence that satisfy the criteria and buffer them all and, finally, iterate over that result to produce the final sequence.

If you’re counting, you’ve come to the conclusion that all items in the original sequence will be iterated over once and the ones in the resulting sequence will be iterated again three times. If the original sequence is large, this can take lots of memory and time.

There’s another issue if you’re using the variant the uses the index of the item in the original sequence in the evaluation of the selection criteria (>). When we reverse the order of the items, the indexes will be reversed and the predicate must take that in account, which might not be possible if you don’t know the number of items in the original sequence.

There must be a better way, and that’s why I implemented the Take Last Operators:

Name Description Example
TakeLast<TSource>(IEnumerable<TSource>)

Returns a specified number of contiguous elements from the end of a sequence.

int[] grades = { 59, 82, 70, 56, 92, 98, 85 };

var topThreeGrades = grades
                     .OrderBy(grade => grade)
                     .TakeLast(3);

Console.WriteLine("The top three grades are:");
foreach (int grade in topThreeGrades)
{
    Console.WriteLine(grade);
}
/*
This code produces the following output:

The top three grades are:
98
92
85
*/
TakeLastWhile<TSource>(IEnumerable<TSource>, Func<TSource, Boolean>)

Returns the elements from the end of a sequence as long as the specified condition is true.

string[] fruits =
    {
        "apple",
        "passionfruit",
        "banana",
        "mango",
        "orange",
        "blueberry",
        "grape",
        "strawberry"
    };

var query = fruits
            .TakeLastWhile(fruit => string.Compare("orange", fruit, true) != 0);

foreach (string fruit in query)
{
    Console.WriteLine(fruit);
}

/*
This code produces the following output:
blueberry
grape
strawberry
*/
TakeLastWhile<TSource>(IEnumerable<TSource>, Func<TSource, Int32, Boolean>)

Returns the elements from the end of a sequence as long as the specified condition is true.

string[] fruits =
    {
        "apple",
        "passionfruit",
        "banana",
        "mango",
        "orange",
        "blueberry",
        "grape",
        "strawberry"
    };

var query = fruits
            .TakeLastWhile((fruit, index) => fruit.Length >= index);

foreach (string fruit in query)
{
    Console.WriteLine(fruit);
}

/*
This code produces the following output:

strawberry
*/

You can find these (and more) operators in my CodePlex project for LINQ utilities and operators: PauloMorgado.Linq

PauloMorgado.Linq Project On CodePlex

LINQ With C# (Portuguese)

I’ve created a project on CodePlex to share my LINQ utilities and operators: PauloMorgado.Linq

In this project you can find the code from my previous posts about the Distinct operator:

In future posts, I’ll introduce other operators.

Hydrating Objects With Expression Trees – Part III

LINQ With C# (Portuguese)

To finalize this series on object hydration, I’ll show some performance comparisons between the different methods of hydrating objects.

For the purpose of this exercise, I’ll use this class:

class SomeType
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTimeOffset CreationTime { get; set; }
    public Guid UniqueId { get; set; }
}

and this set of data:

var data = (
    from i in Enumerable.Range(1, ObjectCount)
    select new object[] { i, i.ToString(), DateTimeOffset.Now, Guid.NewGuid() }
).ToArray();

The data bellow shows the time (in seconds) for different runs for different values of ObjectCount (in the same machine with approximately the same load) as well as it’s availability for different version of the .NET Framework and the C# programming language:

10000 100000 1000000 Valid for
Setup Hydrate Total Setup Hydrate Total Setup Hydrate Total Framework version C# Version
Activation and Reflection setter 0.060 0.101 0.161 0.055 0.736 0.791 0.054 6.822 6.876 1.0, 1.1, 2.0, 3.5, 4.0 1.0, 2.0, 3.0, 4.0
Activation and Expression Tree setter 0.300 0.003 0.303 0.313 0.049 0.359 0.293 0.578 0.871 4.0 none
Member Initializer 0.035 0.001 0.036 0.039 0.027 0.066 0.041 0.518 0.559 3.5, 4.0 3.0, 4.0

These values will vary with the number of the objects being hydrated and the number of its properties, but the method using the Member Initializer will be the most performant.

Code samples for this series of posts (and the one about object dumping with expression trees) can be found on my MSDN Code Gallery: Dump And Hydrate Objects With Expression Trees