LINQ: Enhancing Distinct With The PredicateEqualityComparer

LINQ With C# (Portuguese)

Today I was writing a LINQ query and I needed to select distinct values based on a comparison criteria.

Fortunately, LINQ’s Distinct method allows an equality comparer to be supplied, but, unfortunately, sometimes, this means having to write custom equality comparer.

Because I was going to need more than one equality comparer for this set of tools I was building, I decided to build a generic equality comparer that would just take a custom predicate. Something like this:

public class PredicateEqualityComparer<T> : EqualityComparer<T>
{
    private Func<T, T, bool> predicate;

    public PredicateEqualityComparer(Func<T, T, bool> predicate)
        : base()
    {
        this.predicate = predicate;
    }

    public override bool Equals(T x, T y)
    {
        if (x != null)
        {
            return ((y != null) && this.predicate(x, y));
        }

        if (y != null)
        {
            return false;
        }

        return true;
    }

    public override int GetHashCode(T obj)
    {
        // Always return the same value to force the call to IEqualityComparer<T>.Equals
        return 0;
    }
}

Now I can write code like this:

.Distinct(new PredicateEqualityComparer<Item>((x, y) => x.Field == y.Field))

But I felt that I’d lost all conciseness and expressiveness of LINQ and it doesn’t support anonymous types. So I came up with another Distinct extension method:

public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source, Func<TSource, TSource, bool> predicate)
{
    return source.Distinct(new PredicateEqualityComparer<TSource>(predicate));
}

And the query is now written like this:

.Distinct((x, y) => x.Field == y.Field)

Looks a lot better, doesn’t it? And it works wit anonymous types.

Update: I, accidently, had published the wrong version of the IEqualityComparer<T>.Equals method,

10 Responses to LINQ: Enhancing Distinct With The PredicateEqualityComparer

  • paulo says:

    Using the predicate, greatly improves readability, conciseness and expressiveness of the queries, but it can be even better. Most of the times, we don’t want to provide a comparison method but just to extract the comaprison key for the elements.

    So, I developed a SelectorEqualityComparer that takes a method that extracts the key value for each element.

    msmvps.com/…/linq-enhancing-distinct-with-the-selectorequalitycomparer.aspx

  • Diego Costa says:

    Bela extensão, tive de fazer um distinct em 3 campos com validações distintas e ainda assim no retorno manter todos os campos no meu IQueryable.

    Uma mão na roda!

    Abraços

  • Diego Costa says:

    Idk why but I sent you mt comment in portuguese.. no prob I guess.

    Well it helped me a lot, I had to distinct all my IQueryable itens by 3 fields, and the return had to be with all fields.

    With your extension I could just distinct the fields I needed withouth having to create a class or a newlist with only the fields to distinct.

    Thanks!

  • paulo says:

    Grato por ter ajudado, Diego. 🙂

  • Yves says:

    Great Post!
    Can you post an small example of how to compare then with two or three fields?

  • paulo says:

    Yves,

    Is this what you’re looking for?

    .Distinct((x, y) => (x.Field1 == y.Field1) && (x.Field2 == y.Field2))

  • Yves says:

    Yeah, exactly that.
    The point is that i couldn’t make it work so i wanted to make sure i was calling the Distinct function appropiately. The answer it’s taking too long (i stopped it after 5 min when it takes usually 10 seconds)
    Can you please help me out? I must be doing something really wrong and i can’t see it!
    Here is my problematic code:

    var customers =
    (from c in customers
    from ad in addresses
    where (a.customer_id == c.customer_id)
    select new
    {
    c.customer_id,
    c.customer_name,
    address = ad.address_street + ” ” + ad.address_city,
    c.customer_birthday
    })
    .Distinct((x, y) => (x.customer_id == y.customer_id) && (x.customer_name == y.customer_name) && (x.address == y.address) ) //PredicateEqualityComparer Version
    .OrderByDescending(c => c.customer_birthday).Take(10);

    I didn’t repeat the missing code which is obviously your class but the static Distinct i placed it inside this static class:

    public static class Extensions
    {
    public static IEnumerable Distinct(this IEnumerable source, Func predicate)
    {
    return source.Distinct(new PredicateEqualityComparer
    (predicate));
    }

    public static IEnumerable Distinct(this IEnumerable source, Func selector)
    where TKey : IEquatable
    {
    return source.Distinct(new SelectorEqualityComparer(selector));
    }
    }

    Thanks again!

  • paulo says:

    Yves,

    How many items does your enumerable have?

    You can always do something like this:

    .Distinct(
    (x, y) =>
    {
    Debug.Write(“something meaningful”);
    return (x.Field1 == y.Field1) && (x.Field2 == y.Field2);
    })

    If you can calculate an hashcode for your comparison, probably you would be better with a custom implementation of IEqualityComparer. There are some optimizations around the use of the hashcode.

  • Yves says:

    Uff..
    Man, that definitely killed my VS debugger!
    Well, too bad i can not use your solution. I love it because it’s pretty elegant and i wanted to use it further in every Distinct of my applications but the use of Distinct itself is too unhealty in my case.
    So, if you need to compare multiple fields in big tables (thousands of records) using Distinct with EqualityComparer you better use Group by!
    You can read why here:
    http://imar.spaanjaars.com/546/using-grouping-instead-of-distinct-in-entity-framework-to-optimize-performance
    Thanks for your help! It’s indeed such an elegant solution!

  • paulo says:

    Oh! You’re working with SQL Server.

    It’s not that grouping is better than distinct per se. It’s that distinct with an equality comparer cannot be translated to SQL Server.

    You need to be very careful when using these operators. Once you use an operator on IEnumerable on a sequence of operators on IQueriable, you’ll retrive all the data from the queriable source and end up with an IEnumerable.

    Meanwhile, I’ve changed the implementation of the PredicateEquilityCompararer to accept an hash function. You can get it here: http://pmlinq.codeplex.com/