LA.NET [EN]

Oct 25

In the previous post, we’ve started looking at the Equals method and saw that its default implementation (inherit from Object) had some flaws. We’ve seen a better implementation for it and we’ve also talked about some strategies for overriding the method in new custom types. In this post, we’re going to talk about a somewhat related concept: hash codes.

You see, all objects inherit a method called GetHashCode from the base Object type. This method is virtual, returns an integer and is defined by the Object type because the designers of the framework though that it would be a good idea to allow any object to be used as a key in a hashtable. The current rules governing hash code generation are quite interesting:

  • if two objects *are* equal, then they should return the *same* hash code.
  • if two objects *aren’t* equal, they don’t have to generate *different* hash codes. Many are surprised by this at first…
  • you should use at least one instance field for calculating the hash code of an object. You should rely on immutable fields for this because these fields are initialized during construction and then remain constant during the object’s lifetime. This is important and the docs should have presented this a a *must* (not a *should*).
  • the returned result must be consistent and it should be the same as long as there is no modification to the object state that determines the return value of the Equals method.
  • your method should strive to have a good random distribution.

As you can see, the rules for hash code generation imply that you’ll have to override GetHashCode whenever you override the Equals method. The Object’s GetHashCode implementation inherited by all types will simply return a number which doesn’t change during the object’s lifetime and is guaranteed to uniquely identify it in the current application domain. As you might expect, ValueType does follow the previous rules while overriding the GetHashCode method. Unfortunately, you’ll have the same performance problem mentioned before because it will have to use reflection to ensure that all the fields of a type are used in the algorithm.

Building your own hash code method isn’t as easy as it might look at first. If you look at the previous rules, you’ll notice that there are several constraints which make it hard to implement. One of the things that isn’t mentioned in the previous list (and it should be!) is that hash codes shouldn’t be able to change. In fact, they *can’t* change because it might break your application. Unfortunately,this isn’t really mentioned in the docs nor followed by the framework code. A quick example is the best way of illustrating this point. Take a look at the following code:

struct Point {
    public Int32 X { get; set; }
    public Int32 Y { get; set; }
}

This code seems simple enough and harmless,right? Well, guess what? It’s not…one of the things you should keep in mind while creating new value types is that they should be immutable! For instance, take a look at the DateTime struct…you’ll quickly notice that it doesn’t have any write properties and none of the existing methods change the value of its internal fields (at best, you’ll get a new instance returned). In other words, DateTime is an immutable type: after creating one instance, you can’t really change its state!

Now, if you look at our Point type, you’ll notice that it reuses the base’s Equals and GetHashCode implementation. Yes, I’ve said we should always override those methods, but they should work fine for the purpose of this demo (though probably a bit slower than if we introduced our own overrides of these methods). So. let’s start simple:

var hashtable = new Dictionary<Point, String>( );
var a = new Point {X = 10, Y = 20};
hashtable.Add(a, "Hi"  );
Console.WriteLine(a.GetHashCode( ));
Console.WriteLine(hashtable[a]);

Nothing too fancy here…we’re creating a new instance of a Point and using it as the key of a Dictionary instance. Till now, everything works out perfectly! Now, suppose we do this:

a.X = 20;
Console.WriteLine(a.GetHashCode( ));

I guess that by now you’re seeing it, right? If you’re not hearing alarm bells all over the place, then you should probably make a pause and read the info on the Dictionary class. Specifically, the part where it says this:

Retrieving a value by using its key is very fast, close to O(1), because the Dictionary<TKey, TValue> class is implemented as a hash table.

oopss…if you’ve run the previous code, you’ll notice that a.GetHashCode no longer returns the same value you got in the previous snippet. In fact, go ahead and try to get the previous entry from the hashtable variable:

Console.WriteLine(a.GetHashCode( ));

And here’s the result in my machine (the 1 you see is the total number of entries in the hashtable variable):

post

It seems like you just can’t get the existing entry from the dictionary through the Point instance variable that was used as key. Not good, right? Well, let’s see how we can improve our code to solve this kind of problem. We’ve got several options, but my favorite is turning Point into an immutable instance:

struct Point {
    private readonly Int32 _x;
    private readonly Int32 _y;

    public Int32 X { get { return _x; } }
    public Int32 Y { get { return _y; } }
    public Point(Int32 x, Int32 y) {
        _x = x;
        _y = y;
        _hashcode = null;
    }
    public override bool Equals(object obj) {
        if (obj == null) return false;
        if (obj.GetType() != GetType()) return false;
        var other = ( Point )obj;//unbox
        return other.X == X && other.Y == Y;
    }
    private Int32? _hashcode;
    public override int GetHashCode() {
        if(!_hashcode.HasValue) {
            _hashcode = X.GetHashCode( ) ^ Y.GetHashCode( );
        }
        return _hashcode.Value;
    }
}

I didn’t really follow all the recommendations I’ve mentioned in the previous post (I’ll leave that to you :) ) but now we’ve solved the previous problems. Since point is immutable, you cannot change an instance after creating it and now the hash code stays constant along the instance lifetime.

Notice that I will only calculate the hash code once and only if someone asks for it. If you’re creating a new instance type, you can follow several of the principles presented in this sample. For instance, you should always strive to define which fields are immutable (if you don’t have one, then you can always add one!) and rely on them for calculating the hash code. Since this has become a rather large post, I wont be boring you with an example that shows how this can be done. Instead, I’ll simply redirect you to take a look at the S#arp project, which has some interesting base classes you can reuse to solve these problems.

And that’s it. Stay tuned for more.

1 comment so far

  1. Bruno
    7:24 pm - 10-26-2010

    Just one thing that is worth reminding.
    For best practicing is good to use other Hash Method:

    http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-systemobjectgethashcode

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>