May 07

Multithreading: immutability

Posted in Multithreading      Comments Off on Multithreading: immutability

As we’ve been seeing, ensuring proper isolation of data is a good thing for allowing a system to scale and for easing multithreaded programming. In the last post, we’ve seen that we can use the stack for ensuring that each thread gets its own space for saving items without worrying with concurrent writes/reads. At this time, it’s important to understand that most problems we’ll face in multithreaded apps don’t involve scenarios where several threads read a shared value. In other words, if we’ve got an item and that item is concurrently read by several threads, that’s ok. Problems happen when we have several threads reading and writing or several threads writing to a shared memory location.

So, if we build immutable types, then everything should be a lot easier and we don’t need to synchronize access to these items. Joe Duffy mentions tow types of immutability in his book: shallow and deeply. A type is shallow immutable if each field of an object never changes during its life time. The object is deeply immutable if each field references other objects which don’t change over time too. The CLR supports immutability in a limited way through the concept of read only fields. When we have read only fields, those fields are only allowed to change from within the constructor. Notice that if the object’s constructor publishes itself before ending construction, then you cannot rely on these modifier for ensuring proper immutability. Notice also that there’s no support from the CLR for ensuring deep immutability.

So, if you think about an immutable address, I guess that you could have something like this:

class Address {
        public readonly String _street;
        public readonly String _zipCode;
        public Address(String street,String zipCode) {
            _street = street;
            _zipCode = zipCode;
        public String Street { get { return _street; } }
        public String ZipCode { get { return _street; } }

Notice that after initialization,this object is really immutable. Unfortunately, this won’t cut it most of the time. For instance, suppose you want to have an immutable type but you’d also like, for instance, to have Address instances from a database and you’re using an ORM…In this case, the previous code won’t do it because most ORMs use reflection to set the fields (and, of course, need a default constructor).

In this cases, thinking about data ownership might help. For instance, if we can ensure that Address instances are correctly set up before being published to other threads, then we could probably relax the previous code and transform it into something like this:

class Address {
        public Address(String street, String zipCode) {
            Street = street;
            ZipCode = zipCode;
        private Address() { } //here for the ORM
        public String Street { get ; private set; }
        public String ZipCode { get; private set; }

As you can see, after being correctly filled, you cannot change the state of inner fields (unless you use reflection, of course), though there’s still the problem of ensuring that the object is “private” and will *only* be shared among several threads after having been correctly created (and where created means instantiated through a public constructor or rehydrated from the database). And, of course, there’s still that little gotcha: the CLR does not enforce deeply immutable types (if, for instance, we had an object which wasn’t immutable as a property, things might go awry…).  I guess that there are still other details which we need to worry: for instance, the previous Address type exhibits a value type behavior and we’d really do well in overriding the GetHashCode and Equals methods!

Currently, I’m liking this “create-then publish to rest of the world” approach. I still need to have data synchronization (in other places), but the fact is that having immutable types allow ease sharing between several threads. And what do you think?