Thread synchronization of non-atomic invariants in .NET 4.5

Now that we’ve seen how a singular x86-x64 focus might affect how we can synchronize atomic invariants, let’s look at non-atomic invariants.

While an atomic invariant really doesn’t need much in the way of guarding, non-atomic invariants often do.  The rules by which the invariant is correct are often much more complex.  Ensuring an atomic invariant like int, for example is pretty easy: you can’t set it to an invalid value, you just need to make sure the value is visible.  Non-atomic invariants involve data that can’t natively be modified atomically.  The typical case is more than one variable, but can include intrinsic types that are not guaranteed to be modified atomically (like long and decimal).  There is also the fact that not all operations on an atomic type are performed atomically.

For example, let’s say I want to deal with a latitude longitude pair.  That pair of floating-point values is an invariant, I need to model accesses to that pair as an atomic operation.  If I write to latitude, that value shouldn’t be “seen” until I also write to longitude.  The following code does not guard that invariant in a concurrent context:

latitude = 39.73;

longitude = -86.27;

If somewhere else I changed these values, for example I wanted to change from the location of Indianapolis, IN to Ottawa, ON:

   1: latitude = 45.4112;

   2: longitude = -75.6981;

Another thread reading latitude/longitude if the thread was executing the above code was between line 1 and 2, would read a lat/long for some place near Newark instead of Ottawa or Indianapolis (the two lat/longs being written).  Making these write operations volatile does nothing to help make the operation atomic and thread-safe.  For example, the following is still not thread-safe:

   1: Thread.VolatileWrite(ref latitude, 45.4112);

   2: Thread.VolatileWrite(ref longitude, -75.6981);

A thread can still read latitude or longitude after line 1 executes on another thread and before line 2.  Given two variables that are publicly visible, the only way to make an operation on both “atomic” is to use lock or use a synchronization class like Monitor, Semaphore, Mutex, etc.  For example:

lock(latLongLock)

{

    latitude = 45.4112;

    longitude = -75.6981;

}

Considering latitude and longitude “volatile”, doesn’t help us at all in this situation—we have to use lock.  And once we use lock, there’s no need to consider the variables volatile, no two threads can be in the same critical region at the same time, and any side-effect resulting from executing that critical region are guaranteed to be visible as soon as the lock is released. (as well any potentially visible side-effects from other threads are guaranteed to be visible as soon as the lock is acquired).

There are circumstances where you can have loads/stores to different addresses that get reordered in relation to each other (a load can be reordered with older stores to a different memory address).  So, conceptually given two threads executing on different cores/CPUS executing the following code at the same time:

x = 1;    |    y = 1;

r1 = y;   |    r2 = x;

This could result in r1 == 0 and r2 == 0 (as described in section 8.2.3.2 of Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A) assuming r1 and r2 access was optimized by the compiler to be an register access.  The only way to avoid this would be to force a memory barrier.  The use of volatile, as we’ve seen the prior post, is not enough to ensure a memory fence is invoked under all circumstances.  This can be done manually through the use of Thread.MemoryBarrier, or through the use of lockThread.MemoryBarrier is less understood by a wide variety of developers, so using lock is almost always what should be used prior to any micro-optimizations.  For example:

lock(lockObject)

{

  x = 1;

  r1 = y;

}

and

 

lock(lockObject)

{

  y = 1;

  r2 = x;

}

This basically assumes x and y are involved in a particular invariant and that invariant needs to be guaranteed through atomic access to the pair of variables—which is done by creating a critical regions of code where only one region can be executing at a time across threads.

Revisiting the volatile keyword

The first post in this series could have came of as suggesting that volatile is always a good thing.  As we’ve seen in the above, that’s not true.  Let me be clear: using volatile in what I described previously is an optimization.  It should be a micro-optimization that should be used very, very carefully.  What is an isn’t an atomic invariant isn’t always cut and dry.  Not every operation on an atomic type is an atomic operation.

Let’s look at some of the problems of volatile:

The first, and arguably the most discussed problem, is that volatile decorates a variable not the use of that variable.  With non-atomic operations on an atomic variable, volatile can give you a false sense of security.  You may think volatile gives you thread-safe code in all accesses to that variable, but it does not.  For example:

private volatile int counter;

private void DoSomething()

{

    //...

    counter++;

    //...

}

Although many processors have a single instruction to increment an integer, “there is no guarantee of atomic read-modify-write, such as in the case of increment or decrement” [1].  Despite counter being volatile, there’s no guarantee this operation will be atomic and thus there’s no guarantee that it will be thread-safe.  In the general case, not every type you can use operator++ on is atomic—looking strictly at “counter++;”, you can’t tell if that’s thread-safe..  If counter were of type long, access to counter is no longer atomic and a single instruction to increment it is only possible on some processors (regardless of lock of guarantees that it will be used). If counter were an atomic type, you’d have to check the declaration of the variable to see if it was volatile or not before deciding if it’s potentially thread-safe.   To make incrementing a variable thread-safe, the Interlocked class should be used for supported types:

private int counter;

private void DoSomething()

{

    //...

    System.Threading.Interlocked.Increment(ref counter);

    //...

}

Non-atomic types like long, ulong (i.e. not supported by volatile) are supported by Interlocked.  For non-atomic types not supported by Interlocked, lock is recommended until you’ve verified another method is “better” and works:

private decimal counter;

private readonly object lockObject = new object();

private void DoSomething()

{

    //...

    lock(lockObject)

    {

        counter++;

    }

    //...

}

That is volatile is problematic because it can only be applied to member fields and only to certain types of member fields. 

The general consensus is that because volatile doesn’t decorate the operations that are potentially performed in a concurrent context, and doesn’t consistently lead to more efficient code in all circumstances, and passing a volatile field by ref circumvents the fields volatility, and would fail if used with non-atomic invariants, and lack of consistency with correctly guarded non-atomic operations, etc.; that the volatile operations should be explicit through the use of Interlocked, Thread.VolatileRead, Thread.VolatileWrite, or the use of lock and not through the use of the volatile keyword.

Conclusion

Concurrent and multithreaded programming is not trivial.  It involves dealing with non-sequential operations through the writing of sequential code.  It’s prone to error and you really have to know the intent of your code in order to decide not only what might be used in a concurrent context as well as what is thread-safe.  i.e. “thread-safe” is application specific. 

Despite only really having support for x86/x64 “out of the box” in .NET 4.5 (i.e. Visual Studio 2012), the potential side-effects of assuming an x86/x64 memory model just muddies the waters.  I don’t think there is any benefit to writing to a x86/x64 memory model over writing to the .NET memory model.  Nothing I’ve shown really affects existing guidance on writing thread-safe and concurrent code—some of which are detailed in Visual Studio 2010 Best Practices.

Knowing what’s going on at lower levels in any particular situation is good, and anything you do in light of any side-effects should be considered micro-optimizations that should be well scrutinized.

[1] C# Language Specification § 5.5 Atomicity of variable references

(function() {
var po = document.createElement(‘script’); po.type = ‘text/javascript’; po.async = true;
po.src = ‘https://apis.google.com/js/plusone.js’;
var s = document.getElementsByTagName(‘script’)[0]; s.parentNode.insertBefore(po, s);
})();

Thread synchronization of atomic invariants in .NET 4.5 clarifications

In Thread synchronization of atomic invariants in .NET 4.5 I’m presenting my observations of what the compiler does in very narrow context of only on Intel x86 and Intel x64 with a particular version of .NET.  You can install SDKs that give you access to compilers to other processors.  For example, if you write something for Windows Phone or Windows Store, you’ll get compilers for other processors (e.g. ARM) with memory models looser than x86 and x64.  That post was only observations in the context of x86 and x64. 

I believe more knowledge is always better; but you have to use that knowledge responsibly.  If you know you’re only ever going to target x86 or x64 (and you don’t if you use AnyCPU even in VS 2012 because some yet-to-be-created processor might be supported in a future version or update to .NET) and you do want to micro-optimize your code, then that post might give you enough knowledge to do that.  Otherwise, take it with a grain of salt.  I’ll get into a little bit more detail in part 2: Thread synchronization of non-atomic invariants in .NET 4.5 at a future date—which will include more specific guidance and recommendations.

In the case were I used a really awkwardly placed lock:

   1: var lockObject = new object();

   2: while (!complete)

   3: {

   4:     lock(lockObject)

   5:     {

   6:         toggle = !toggle;

   7:     }

   8: }

It’s important to point out the degree of implicit side-effects that this code depends on.  One, it assumes that the compiler is smart enough to know that a while loop is the equivalent of a series of sequential statements.  e.g. this is effectively equivalent to:

   1: var lockObject = new object();

   2: if (complete == false) return;

   3: lock (lockObject)

   4: {

   5:     toggle = !toggle;

   6: }

   7: if (complete == false) return;

   8: lock (lockObject)

   9: {

  10:     toggle = !toggle;

  11: }

  12: //...

That is, there is the implicit volatile read (e.g. a memory fence, from the Monitor.Enter implementation detail) at the start of the lock block and an implicit volatile write (e.g. a memory fence, from the Monitor.Exit implementation detail).

In case it wasn’t obvious, you should never write code like this, it’s simply an example—and as I pointed out in the original post, it’s confusing to anyone else reading it: lockObject can’t be shared amongst threads and the lock block really isn’t protecting toggle and can/likely to get “maintained” into something that no longer works.

In the same grain, the same can be said for the original example of this code:

   1: static void Main()

   2: {

   3:   bool complete = false; 

   4:   var t = new Thread (() =>

   5:   {

   6:     bool toggle = false;

   7:     while (!complete)

   8:     {

   9:         Thread.MemoryBarrier();

  10:         toggle = !toggle;

  11:     }

  12:   });

  13:   t.Start();

  14:   Thread.Sleep (1000);

  15:   complete = true;

  16:   t.Join();

  17: }

While this code works, it’s not apparently clear that the Thread.MemoryBarrier() is there so that our read of complete (and not toggle) isn’t optimized into a registry read.  Regardless of the degree you might be able to depend on the compiler continuing to do this is up to you.  The code is equally as valid and more clear if written to use Thread.VolatileRead, except for the fact that Thread.VolatileRead does not support the Boolean type.  It can be re-written using Int32 instead.  For example:

   1: static void Main(string[] args)

   2: {

   3:   int complete = 0; 

   4:   var t = new Thread (() =>

   5:   {

   6:     bool toggle = false;

   7:     while (Thread.VolatileRead(ref complete) == 0)

   8:     {

   9:         toggle = !toggle;

  10:     }

  11:   });

  12:   t.Start();

  13:   Thread.Sleep (1000);

  14:   complete = 1; // CORRECTION from 0

  15:   t.Join();

  16: }

Which is more clear and shows your intent more explicitly.

Thread synchronization of atomic invariants in .NET 4.5

I’ve written before about multi-threaded programming in .NET (C#).  Spinning up threads and executing code on another thread isn’t really the hard part.  The hard part is synchronization of data between threads.

Most of what I’ve written about is from a processor agnostic point of view.  It’s written from the historical point of view: that .NET supports many processors with varying memory models.  The stance has generally been that you’re programming for the .NET memory model and not a particular processor memory model.

But, that’s no longer entirely true.  In 2010 Microsoft basically dropped support for Itanium in both Windows Server and in Visual Studio (http://blogs.technet.com/b/windowsserver/archive/2010/04/02/windows-server-2008-r2-to-phase-out-itanium.aspx).  In VS 2012 there is no “Itanium” choice in the project Build options.  As far as I can tell, Windows 2008 R2 is the only Windows operating system, still in support, that supports Itanium.  And even Windows 2008 R2 for Itanium is not supported for .NET 4.5 (http://msdn.microsoft.com/en-us/library/8z6watww.aspx)

So, what does this mean to really only have the context of running only x86/x64?  Well, if you really read the documentation and research the Intel x86 and x64 memory model this could have an impact in how you write multi-threaded code with regard to shared data synchronization.  x86 and x64 memory models include guarantees like “In a multiple-processor system…Writes by a single processor are observed in the same order by all processors.” but does and also includes guarantees like “Loads May Be Reordered with Earlier Stores to Different Locations”.  What this really means is that a store or a load to a single location won’t be reordered with regard to a load or a store to the same location across processors.  That is we don’t need fences to ensure a store to a single memory location is “seen” by all threads or that a load from memory loads the “most recent” value stored.  But, it does mean that in order for multiple stores to multiple locations to be viewed by other threads in the same order, a fence is necessary (or the group of store operations is invoked as an atomic action through the user of synchronization primitives like Monitor.Enter/Exit, lock, Semaphore, etc.) (See section 8.2 Memory Ordering  of the Intel Software Developer’s Manual Volume 3A found here).  But, that deals with non-atomic invariants which I’ll detail in another post.

To be clear, you could develop to just x86 or just x64 prior to .NET 4.5 and have all the issues I’m about to detail.

Prior to .NET 4.5 you really programmed to the .NET memory model.  This has changed over time since ECMA defined it around .NET 2.0; but that model was meant to be a “supermodel” to deal with the fact that .NET could be deployed to different CPUs with disparate memory models.  Most notably was the Itanium memory model.  This model is much looser than the Intel x86 memory model and allowed things like a store without a release fence and a load without an acquire fence.  This meant that a load or a store might be done only in one CPU’s memory cache and wouldn’t be flushed to memory until a fence.  This also meant that other CPUs (e.g. other threads) may not see the store or may not get the “latest” value with a load.  You can explicitly cause release and acquire fences with .NET with things like Monitor.Enter/Exit (lock), Interlocked methods, Thread.MemoryBarrier, Thread.VolatileRead/VolatileWrite, etc.  So, it wasn’t a big issue for .NET programmers to write code that would work on an Itanium.  For the most part, if you simply guarded all your shared data with a lock, you were fine.  lock is expensive, so you could optimize things with Thread.VolatileRead/VolatileWriter if your shared data was inherently atomic (like a single int, a single Object, etc) or you could use the volatile keyword (in C#).  The conventional wisdom has been to use Thread.VolatileRead/VolatileWrite rather than decorate a field with volatile because you may not need every access to be volatile and you don’t want to take the performance hit when it doesn’t need to be volatile.

For example (borrowed from Jeffrey Richter, but slightly modified) shows synchronizing a static member variable with Thread.VolatileRead/VolatileWrite:

   1: public static class Program {

   2:   private static int s_stopworker;

   3:   public static void Main() {

   4:     Console.WriteLine("Main: letting worker run for 5 seconds");

   5:     Thread t = new Thread(Worker);

   6:     t.Start();

   7:     Thread.Sleep(5000);

   8:     Thread.VolatileWrite(ref s_stopworker, 1);

   9:     Console.WriteLine("Main: waiting for worker to stop");

  10:     t.Join();

  11:   }

  12:  

  13:   public static void Worker(object o) {

  14:     Int32 x = 0;

  15:     while(Thread.VolatileRead(ref s_stopworker) == 0)

  16:     {

  17:       x++;

  18:     }

  19:   }

  20: }

 
Without the call to Thread.VolatileWrite the processor could reorder the write of 1 to s_stopworker to after the read (assuming we’re not developing to on particular processor memory model and we’re including Itanium).  In terms of the compiler, without Thread.VolatileRead it could cache the value being read from s_stopworker in to a register.  For example, removing the Thread.VolatileRead, the compiler optimizes the comparison of s_stopworker to 0 in the while to single register (on x86):
 
00000000  push        ebp 

00000001  mov         ebp,esp 

00000003  mov         eax,dword ptr ds:[00213360h] 

00000008  test        eax,eax 

0000000a  jne         00000010 

0000000c  test        eax,eax 

0000000e  je          0000000C 

00000010  pop         ebp 

00000011  ret 

The loop is 0000000c to 0000000e (really just testing that the eax register is 0). Using Thread.VolatileRead, we’d always get a value from a physical memory location:

00000000  push        ebp 

00000001  mov         ebp,esp 

00000003  lea         ecx,ds:[00193360h] 

00000009  call        71070480 

0000000e  test        eax,eax 

00000010  jne         00000021 

00000012  lea         ecx,ds:[00193360h] 

00000018  call        71070480 

0000001d  test        eax,eax 

0000001f  je          00000012 

00000021  pop         ebp 

00000022  ret 

The loop is now 00000012 to 0000001f, which shows calling Thread.VolatileRead each iteration (location 00000018). But, as we’ve seen from the Intel documentation and guidance, we don’t really need to call VolatileRead, we just don’t want the compiler to optimize the memory access away into a register access. This code works, but we take the hit of calling VolatileRead which forces a memory fence through a call to Thread.MemoryBarrier after reading the value.  For example, the following code is equivalent:

while(s_stopworker == 0)

{

  Thread.MemoryBarrier();

  x++;

}

And this works equally as well as using Thread.VolatileRead, and compiles down to:

00000000  push        ebp 

00000001  mov         ebp,esp 

00000003  cmp         dword ptr ds:[002A3360h],0 

0000000a  jne         0000001A 

0000000c  lock or     dword ptr [esp],0 

00000011  cmp         dword ptr ds:[002A3360h],0 

00000018  je          0000000C 

0000001a  pop         ebp 

0000001b  ret 

The loop is now is 0000000c to 00000018. As we can see at 0000000c we have an extra “lock or” instruction—which is what the compiler optimizes a call to Thread.MemoryBarrier to. This instruction really just or’s 0 with what esp is pointing to (i.e. “nothing”, zero or’ed with something else does not change the value). But the lock prefix forces a fence and is less expensive than instructions like mfence. But, based on what we know of the x86/x64 memory model, we’re only dealing with a single memory location and we don’t need that lock prefix—the inherent memory guarantees of the processor means that our thread can see any and all writes to that memory location without this extra fence. So, what can we do to get rid of it? Well, using volatile actually results in code that doesn’t generate that lock or instruction. For example, if we change our code to make s_stopworker volatile:

   1: public static class Program {

   2:   private static volatile int s_stopworker;

   3:   public static void Main() {

   4:     Console.WriteLine("Main: letting worker run for 5 seconds");

   5:     Thread t = new Thread(Worker);

   6:     t.Start();

   7:     Thread.Sleep(5000);

   8:     s_stopworker = 1;

   9:     Console.WriteLine("Main: waiting for worker to stop");

  10:     t.Join();

  11:   }

  12:  

  13:   public static void Worker(object o) {

  14:     Int32 x = 0;

  15:     while(s_stopworker == 0)

  16:     {

  17:       x++;

  18:     }

  19:   }

  20: }

We tell the compiler that we don’t want accesses to s_stopworker optimized.  This then compiles down to:

00000000  push        ebp 

00000001  mov         ebp,esp 

00000003  cmp         dword ptr ds:[00163360h],0 

0000000a  jne         00000015 

0000000c  cmp         dword ptr ds:[00163360h],0 

00000013  je          0000000C 

00000015  pop         ebp 

00000016  ret 

The loop is now 0000000c to 00000013. Notice that we’re simply getting the value from memory on each iteration and comparing it to 0. There’s no lock or. One less instruction and no extra memory fence. Although in many cases it doesn’t matter (i.e. you might only do this once, in which case an extra few milliseconds won’t hurt and this might be a premature optimization), but using lock or with the register optimization is about 992% slower when measured on my computer (or volatile is 91% faster than using Thread.MemoryBarrier and probably a bit faster still than use Thread.VolatileRead).  This is actually contradictory to conventional wisdom with respect to a .NET memory model that supports Itanium.  If you want to support Itanium, every access to a volatile field would be tantamount to Thread.VolatileRead or Thread.VolatileWrite, in which case, yes, in scenarios where you don’t really need the field to be volatile, you take a performance hit.

In .NET 4.5 where Itanium is out of the picture, you might be thinking “volatile all the time then!”.  But, hold on a minute, let’s look at another example:

 

   1: static void Main()

   2: {

   3:   bool complete = false; 

   4:   var t = new Thread (() =>

   5:   {

   6:     bool toggle = false;

   7:     while (!complete)

   8:     {

   9:         Thread.MemoryBarrier();

  10:         toggle = !toggle;

  11:     }

  12:   });

  13:   t.Start();

  14:   Thread.Sleep (1000);

  15:   complete = true;

  16:   t.Join();

  17: }

This code (borrowed from Joe Albahari) will block indefinitely at the call to Thread.Join (line 16) without the call to Thread.MemoryBarrier() (at line 9). 

This code blocks indefinitely without Thread.MemoryBarrier() on both x86 and x64; but this is due to compiler optimizations, not because of the processor’s memory model. We can see this in the disassembly of what the JIT produces for the thread lambda (on x64):

00000000  push        ebp 

00000001  mov         ebp,esp 

00000003  movzx       eax,byte ptr [ecx+4] 

00000007  test        eax,eax 

00000009  jne         0000000F 

0000000b  test        eax,eax 

0000000d  je          0000000B 

0000000f  pop         ebp 

00000010  ret 

Notice the loop (0000000b to 0000000d), the compiler has optimized access to the variable toggle into a register and doesn’t update that register from memory—identical to what we saw with the member field above. If we see the disassembly when using MemoryBarrier:

00000000  movzx       eax,byte ptr [rcx+8] 

00000004  test        eax,eax 

00000006  jne         0000000000000020 

00000008  nop         dword ptr [rax+rax+00000000h] 

00000010  lock or     dword ptr [rsp],0 

00000015  movzx       eax,byte ptr [rcx+8] 

00000019  test        eax,eax 

0000001b  je          0000000000000010 

0000001d  nop         dword ptr [rax] 

00000020  rep ret 

We see that loop testing toggle (instructions from 00000010 to 0000001b) grabs the memory value into eax then tests eax until it’s true (or non-zero). MemoryBarrier has been optimized to “lock or” here as well.

What we’re dealing with here is a local variable and can’t use the volatile keyword.  We could use the lock keyword to get a fence, it couldn’t be around the comparison (the while) because that would enclose the entire while block and would never exit the lock to get the memory fence and thus the compiler believes reads of toggle aren’t guarded by lock’s implicit fences.  We’d have to wrap the assignment to toggle to get the release fence before and the acquire fence after, ala:

var lockObject = new object();

while (!complete)

{

    lock(lockObject)

    {

        toggle = !toggle;

    }

}

Clearly this lock block isn’t really a critical section because the lockObject instance can’t be shared amongst threads.  Anyone reading this code is likely going to think “WTF?”. But, we do get our fences, and the compiler will not optimize access to toggle to only a register and our code will no longer block at the call to Thread.Join.  It’s apparent that Thread.MemoryBarrier is the better choice in this scenario, it’s just more readable and doesn’t appear to be poorly written code (i.e. code that only depends on side effects).

But, you still take the performance hit on “lock or”.  If you want to avoid that, then refactor the local toggle variable to a field and decorate it with volatile.

Although some of this seems like micro-optimizations, but it’s not.  You have to be careful to “synchronize” shared atomic data with respect to compiler optimizations, so you might as well pick the best way that works.

 

In the next post I’ll get into synchronizing non-atomic invariants shared amongst threads.

 

Visual Studio 2010 Best Practices published

Most of my spare time in the last few months has been taken up by writing Visual Studio 2010 Best Practices.  This has now been published and is available through publisher (no longer pre-order) at http://bit.ly/Px43Pw.  The pre-order price is still available for a limited time.  Amazon still has it out of stock; but $39.99 at http://amzn.to/QDDmF7.

The title of the book really doesn’t do the content justice.  Least of which is “Best Practices”.  Anyone who knows me should know I don’t really like that term.  But, hopefully those looking for best practices will read the book and learn from chapter one why “best practice” has problems.

While it’s called “Visual Studio 2010 Best Practices” it isn’t limited to the UI of Visual Studio (or Visual Studio 2010 really, for that matter).  It’s really a set of generally accepted recommended practices based on expertise and experience for any and all developers of .NET (it assumes they use Visual Studio–but many practices deal with general design/development that can be applied almost anywhere).  There are some specifics in there about the Visual Studio UI like optimizing Visual Studio settings/configuration, useful features, the correct way to use certain features, etc.  But, that’s mostly limited to one chapter.  Other chapters include recommended practices regarding C#, SCC, deployment, testing, parallelization/multithreading, distributed applications and web services.  From the book overview:

  • Learning source code control
  • Practices for advanced C# syntax
  • Asynchronous programming in C#
  • Learn tips for architecting decoupled systems
  • Practices for designing multi-threaded and parallel systems
  • Practices for designing distributed systems
  • Learn better ways of developing web services with WCF
  • Learn faster ways to design automated tests
  • Tips and tricks to test complex systems
  • Understanding proven ways of deploying software systems in Windows

Kind of a mixed bag; but, you have to work within the bounds you’ve been given :).  It was limited to about 200 pages; so, of course, I couldn’t go into every recommended practice or every useful tidbit that everyone could use…

I’d like to thank a few people for helping-out outside of the publisher’s review channel (i.e. they’re not mentioned in the book):  Joe Miller, Amir Barylko, and of course all those that offered…

Automated Testing Isn’t Just for Business Logic

I had a conversation with Kelly Sommers the other day that was partially a short support group session on the annoying tendencies of development teams to completely lose focus on the architecture and design principles of a system and let the code base devolve into a ball of muddy spaghetti.

One particular area that we discussed, and it’s one area I’ve detailed elsewhere, has to do with layers.  Our gripe was that developers seem to completely ignore layering principles once they start coding and introduce cycles, put things in the wrong layer, etc.  A brief recap of layering principles:  Types in one layer can only access types in the adjacent lower layer.  That’s it.  Types that access types in a layer above are violating layering (or aren’t layer) and types that access types in a layer lower than the adjacent lower level (e.g. two layers down) are violating layering.

I’ve blogged about Visual Studio and layers (and validation) before; but not everyone uses the part of Visual Studio or doesn’t have that edition of Visual Studio.  I mentioned in our conversation it’s fairly easy to write unit tests to make these verifications.  I’ve written tests like this before, but the assumption was that “layers” were in different assemblies.  The verification for this scenario is quite a bit simpler; so, I thought I’d tackle a test that verifies layering within a single assembly where namespaces are the scope of layers.

My initial code used Enumerable.Any to see if any types from a lower layer not adjacent to the current layer where used in this layer or whether any types from any layers above the current layer where used in this layer.  This did the validation, but basically left the dev with a “test failed and I’m not giving you any details” message because we couldn’t tell where the violation was and what violated it—which isn’t too friendly.  So, I expanded it out to detail all the violations.  I came up with a utility method ValidateLayerRelationships would be used as follows:


public enum Layer {
    // Order is important!
    Data,
    Domain,
    UI
}
 
[TestMethod]
public void ValidateLayerUsage()
{
    var relatedNamespaces = new[] { "PRI.Data""PRI.Domain""PRI.FrontEnd""PRI.ViewModels" };
 
    var levelMap = new Dictionary<stringLayer> {
                    {relatedNamespaces[0], Layer.Data},
                    {relatedNamespaces[1], Layer.Domain},
                    {relatedNamespaces[2], Layer.UI},
                    {relatedNamespaces[3], Layer.UI},
                    };
 
    var assemblyFileName = "ClassLibrary.dll";
    ValidateLayerRelationships(levelMap, assemblyFileName);
}

In this example I have two namespaces in one layer (the UI layer) FrontEnd and ViewModels and two other layers with just one namespace in each (Data with Data and Domain with Domain). mostly to show you can have more than one namespace per layer…   We define a layer map, and the filename of the assembly we want to validate and call ValidateLayerRelationshipsValidateLayerRelationships is as follows:


private static void ValidateLayerRelationships(Dictionary<stringLayer> levelMap, string assemblyFileName) {
    // can't use ReflectionOnlyLoadFrom because we want to peek at attributes
    var groups = from t in Assembly.LoadFrom(assemblyFileName).GetTypes()
                    where levelMap.Keys.Contains(t.Namespace)
                    group t by t.Namespace
                    into g
                    orderby levelMap[g.Key]
                    select g;
 
    var levelsWithClasses = groups.Count();
    Assert.IsTrue(levelsWithClasses > 1, "Need more than two layers to validate relationships.");
 
    var errors = new List<string>();
    foreach (var g in groups){
        var layer = levelMap[g.Key];
        // verify this level only accesses things from the adjacent lower layer (or layers)
        var offLimitSubsets = from g1 in groups where !new[] {layer - 1, layer}.Contains(levelMap[g1.Key]) select g1;
        var offLimitTypes = offLimitSubsets.SelectMany(x => x).ToList();
        foreach (Type t in g){
            foreach (MethodInfo m in t.GetAllMethods()){
                var methodBody = m.GetMethodBody();
                if (methodBody != null)
                    foreach (LocalVariableInfo v in methodBody
                        .LocalVariables
                        .Where(v => offLimitTypes
                                        .Contains(v.LocalType)))
                    {
                        errors.Add(
                            string.Format(
                                "Method \"{0}\" has local variable of type {1} from a layer it shouldn't.",
                                m.Name,
                                v.LocalType.FullName));
                    }
                foreach (ParameterInfo p in m
                    .GetParameters()
                    .Where(p => offLimitTypes
                                    .Contains(p.ParameterType)))
                {
                    errors.Add(
                        string.Format(
                            "Method \"{0}\" parameter {2} uses parameter type {1} from a layer it shouldn't.",
                            m.Name,
                            p.ParameterType.FullName,
                            p.Name));
                }
                if (offLimitTypes.Contains(m.ReturnType)){
                    errors.Add(
                        string.Format(
                            "Method \"{0}\" uses return type {1} from a layer it shouldn't.",
                            m.Name,
                            m.ReturnType.FullName));
                }
            }
            foreach (PropertyInfo p in t
                .GetAllProperties()
                .Where(p => offLimitTypes.Contains(p.PropertyType)))
            {
                errors.Add(
                    string.Format(
                        "Type \"{0}\" has a property \"{1}\" of type {2} from a layer it shouldn't.",
                        t.FullName,
                        p.Name,
                        p.PropertyType.FullName));
            }
            foreach(FieldInfo f in t.GetAllFields().Where(f=>offLimitTypes.Contains(f.FieldType)))
            {
                errors.Add(
                    string.Format(
                        "Type \"{0}\" has a field \"{1}\" of type {2} from a layer it shouldn't.",
                        t.FullName,
                        f.Name,
                        f.FieldType.FullName));
            }
        }
    }
    if (errors.Count > 0)
        Assert.Fail(String.Join(Environment.NewLine, new[] {"Layering violation."}.Concat(errors)));
}

This method groups types within a layer, then goes through any layers that layer shouldn’t have access to (i.e. any layer that isn’t the lower adjacent layer, or “layer – 1, layer” where we create offLimitSubsets).  For each type we look at return values, parameter values, fields, properties, and methods for any types they use.  If any of those types are one of the off limit types, we add an error to our error collection.  At the end, if there’s any errors we assert and format a nice message with all the violations.

This is a helper method that you’d use somewhere (maybe a helper static class, within the existing test class, whatever).

This uses some extension classes to make it a bit more readable, which are here:


public static class TypeExceptions {
    public static IEnumerable<MethodInfo> GetAllMethods(this Type type) {
        if (type == nullthrow new ArgumentNullException("type");
        return
            type.GetMethods(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.Public).Where(
                m => !m.GetCustomAttributes(true).Any(a => a is CompilerGeneratedAttribute));
    }
    public static IEnumerable<FieldInfo> GetAllFields(this Type type) {
        if (type == nullthrow new ArgumentNullException("type");
        return type.GetFields(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.Public)
            .Where(f => !f.GetCustomAttributes(true).Any(a => a is CompilerGeneratedAttribute));
    }
    public static IEnumerable<PropertyInfo> GetAllProperties(this Type type) {
        if (type == nullthrow new ArgumentNullException("type");
        return
            type.GetProperties(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.Public).Where
                (p => !p.GetCustomAttributes(true).Any(a => a is CompilerGeneratedAttribute));
    }
}

Because the compiler generates fields for auto properties and methods for properties, we want to filter out any compiler-generated stuff (what caused the compiler to generate the code will raise a violation) so we don’t get any duplicate violations (and violations the user can’t do anything about).  (which is what the call to GetCustomAttributes is doing)

I wasn’t expecting this to be that long; so, in future blog entries I’ll try to detail some other unit tests that validate or verify specific infrastructural details.  If you have any specific details you’re interested in, leave a comment.

Dispose Pattern and “Set large fields to null”

I was involved in a short side discussion about “should” fields be set to null in the Dispose method(s).  I’m not sure what the impetus of the question was; but, if you read through the dispose pattern MSDN documentation (in most versions I believe) there’s a comment // Set large fields to null. in the implementation of the virtual Dispose method within the if(!disposed) block and after the if(disposing) block.  But, that’s the only reference to setting fields to null during dispose.  There’s nothing else that I’ve been able to find in MSDN with regard to setting fields to null.

At face value, setting a field to null means that the referenced object is now unrooted from the class that owns the field and, if that was the last root of that reference, the Garbage Collector (GC) is now free to release the memory used by the object that was referenced by that field.  Although advanced, this seems all very academic because the amount of time between unrooting the reference and the return from Dispose (and thus the unrooting of the parent object) would seem like a very short amount of time.  Even if the amount of time between these two actions is small, setting a single field to null (i.e. a single assignment) seems like such a minor bit of code to provide no adverse affects.  The prevalent opinion seems to be that the GC “handles” this case and does what is best for you without setting the field to null.

The GC is pretty smart.  There’s a lot of bright people who have worked on the GC over the years; and it improves every release of .NET.  But, that doesn’t answer the question; is there benefit to setting a field to null in the Dispose method?  Considering there isn’t much guidance on the topic; I’d though I’d set aside any faith I have in the GC and throw some science at the problem: take my theory, create some experiments, make observations, and collect some evidence.

What I did was to create two classes, identical except that the Dispose method doesn’t set a reference field to null.  The classes contain an field that could reference a “large” or “small” object: I would experiment with large objects and small objects and observe the differences.  The following are the classes:

	public class First : IDisposable {
		int[] arr = new int[Constants.ArraySize];
		public int[] GetArray() {
			return arr;
		}
		public void Dispose() {
			arr = null;
		}
	}
 
	public class Second : IDisposable {
		int[] arr = new int[Constants.ArraySize];
		public int[] GetArray() {
			return arr;
		}
 
		public void Dispose() {
		}
	}

I would vary Constants.ArraySize constant to make the arr reference a “large” object or a “small” object.  I then created a loop that created several thousand instances of one of these classes then forced a garbage collection at the end; keeping track of the start time and the end time via Stopwatch:

	public class Program {
		private const int Iterations = 10000;
 
		static void Main(string[] args)
		{
			var stopwatch = Stopwatch.StartNew();
			for (int i = 0; i < Iterations; ++i)
			{
				using (var f = new First())
				{
					ConsumeValue(f.GetArray().Length);
				}
			}
			GC.Collect();
			stopwatch.Stop();
			Trace.WriteLine(String.Format("{0} {1}", stopwatch.Elapsed, stopwatch.ElapsedTicks));
			stopwatch = Stopwatch.StartNew();
			for (int i = 0; i < Iterations; ++i)
			{
				using (var s = new Second())
				{
					ConsumeValue(s.GetArray().Length);
				}
			}
			GC.Collect();
			stopwatch.Stop();
			Trace.WriteLine(String.Format("{0} {1}", stopwatch.Elapsed, stopwatch.ElapsedTicks));
		}
 
		static void ConsumeValue(int x) {
		}
	}

I wanted to make sure instanced got optimized away so the GetArray method makes sure the arr field sticks around and the ConsumeValue makes sure the First/Second instances stick around (more a knit-picker circumvention measure :).  Results are the 2nd result from running the application 2 times.

As it turns out, the results were very interesting (at least to me :).  The results are as follows:

Iterations: 10000 ArraySize: 85000 Debug: yes Elapsed time: 00:00:00.0759408 Ticks: 170186.

Iterations: 10000 ArraySize: 85000 Debug: yes Elapsed time: 00:00:00.7449450 Ticks: 1669448.

Iterations: 10000 ArraySize: 85000 Debug: no Elapsed time: 00:00:00.0714526 Ticks: 160128.

Iterations: 10000 ArraySize: 85000 Debug: no Elapsed time: 00:00:00.0753187 Ticks: 168792.

Iterations: 10000 ArraySize: 1 Debug: yes Elapsed time: 00:00:00.0009410 Ticks: 2109.

Iterations: 10000 ArraySize: 1 Debug: yes Elapsed time: 00:00:00.0007179 Ticks: 1609.

Iterations: 10000 ArraySize: 1 Debug: no Elapsed time: 00:00:00.0005225 Ticks: 1171.

Iterations: 10000 ArraySize: 1 Debug: no Elapsed time: 00:00:00.0003908 Ticks: 876.

Iterations: 10000 ArraySize: 1000 Debug: yes Elapsed time: 00:00:00.0088454 Ticks: 19823.

Iterations: 10000 ArraySize: 1000 Debug: yes Elapsed time: 00:00:00.0062082 Ticks: 13913.

Iterations: 10000 ArraySize: 1000 Debug: no Elapsed time: 00:00:00.0096442 Ticks: 21613.

Iterations: 10000 ArraySize: 1000 Debug: no Elapsed time: 00:00:00.0058977 Ticks: 13217.

Iterations: 10000 ArraySize: 10000 Debug: yes Elapsed time: 00:00:00.0527439 Ticks: 118201.

Iterations: 10000 ArraySize: 10000 Debug: yes Elapsed time: 00:00:00.0528719 Ticks: 118488.

Iterations: 10000 ArraySize: 10000 Debug: no Elapsed time: 00:00:00.0478136 Ticks: 107152.

Iterations: 10000 ArraySize: 10000 Debug: no Elapsed time: 00:00:00.0524012 Ticks: 117433.

Iterations: 10000 ArraySize: 40000 Debug: yes Elapsed time: 00:00:00.0491652 Ticks: 110181.

Iterations: 10000 ArraySize: 40000 Debug: yes Elapsed time: 00:00:00.3580011 Ticks: 802293.

Iterations: 10000 ArraySize: 40000 Debug: no Elapsed time: 00:00:00.0467649 Ticks: 104802.

Iterations: 10000 ArraySize: 40000 Debug: no Elapsed time: 00:00:00.0487685 Ticks: 109292.

Iterations: 10000 ArraySize: 30000 Debug: yes Elapsed time: 00:00:00.0446106 Ticks: 99974.

Iterations: 10000 ArraySize: 30000 Debug: yes Elapsed time: 00:00:00.2748007 Ticks: 615838.

Iterations: 10000 ArraySize: 30000 Debug: no Elapsed time: 00:00:00.0411109 Ticks: 92131.

Iterations: 10000 ArraySize: 30000 Debug: no Elapsed time: 00:00:00.0381225 Ticks: 85434.

For the most part, results in debug mode are meaningless.  There’s no point in making design/coding decisions based on perceived benefits in debug mode; so, I don’t the results other than to document them above.

The numbers could go either way, if we look at percentages; release mode, setting a field to null seems to be slower 50% of the time and faster 50% of the time.  When setting a field to null is faster it’s insignificantly faster (5.41%, 9.59%, and 4.28% faster) when it’s slower it’s insignificantly slower but more slow than it is fast (133.68%, 163.52%, and 107.84% slower).  Neither seems to make a whole lot of difference (like 10281 ticks over 10000 iterations in the biggest difference for about 1 tick per iteration—1000 byte array at 10000 iterations).  If we look at just the time values, we start to see that setting a field starts to approach being faster (when it’s slower it’s slower by 295, 8396, and 6697 ticks; when it’s faster it’s faster by 8664, 10281, 4490).  Oddly though, setting “large” fields to null isn’t the biggest of faster setting field to null values.  But, of course, I don’t know what the documentation means by “large”; it could be large-heap objects or some other arbitrary size.

Of course there’s other variables that could affect things here that I haven’t accounted for (server GC, client GC, GC not occurring at specific time, better sample size, better sample range, etc.); so, take the results with a grain of salt.

What should you do with this evidence?  It’s up to you.  I suggest not taking it as gospel and making a decision that is best for you own code based on experimentation and gathered metrics in the circumstances unique to your application and its usage..  i.e. setting a field to null in Dispose is neither bad nor good in the general case.

(function() {
var po = document.createElement(‘script’); po.type = ‘text/javascript’; po.async = true;
po.src = ‘https://apis.google.com/js/plusone.js’;
var s = document.getElementsByTagName(‘script’)[0]; s.parentNode.insertBefore(po, s);
})();

“Virtual method call from constructor” What Could Go Wrong?

If you’ve used any sort of static analysis on source code you may have seen a message like “Virtual method call from constructor”.  In FxCop/Visual-Studio-Code-Analysis it’s CA2214 “Do not call overridable methods in constructors”.  It’s “syntactically correct”; some devs have said “what could go wrong with that”.  I’ve seen this problem in so many places, I’m compelled to write this post.

I won’t get into one of my many pet peeves about ignoring messages like that and not educating yourself about ticking time bombs and continuing in ignorant bliss; but, I will try to make it more clear and hopefully shine some light on this particular class of warnings that arguably should never have made it into object-oriented languages.

Let’s have a look at a simple, but safe, example of virtual overrides:

public class BaseClass {
	public BaseClass() {
	}
 
	protected virtual void ChangeState() {
		// do nothing in base TODO: consider abstract
	}
 
	public void DoSomething() {
		ChangeState();
	}
}
 
public class DerivedClass : BaseClass {
	private int value = 42;
	private readonly int seed = 13;
 
	public DerivedClass() {
	}
 
	public int Value { get { return value; } }
 
	protected override void ChangeState() {
		value = new Random(seed).Next();
	}
}

With a unit test like this:

[TestMethod]
public void ChangeStateTest() {
	DerivedClass target = new DerivedClass(13);
 
	target.DoSomething();
	Assert.AreEqual(1111907664, target.Value);
}

A silly example that has a virtual method that is used within a public method of the base class.  Let’s look at how we might evolve this code into something that causes a problem.

Let’s say that given what we have now, we wanted our derived class to be “initialized” with what ChangeState does (naïvely: it’s there, it does what we want, and we want to “reuse” it in the constructor); so, we modify BaseClass to do this:

public class BaseClass {
	public BaseClass() {
		DoSomething();
	}
 
	protected virtual void ChangeState() {
		// do nothing in base TODO: consider abstract
	}
 
	private void DoSomething() {
		ChangeState();
	}
}
 
public class DerivedClass : BaseClass {
	private int value = 42;
	private readonly int seed = 13;
 
	public DerivedClass() {
	}
 
	public int Value { get { return value; } }
 
	protected override void ChangeState() {
		value = new Random(seed).Next();
	}
}

and we modify the tests to remove the call to DoSomething, as follows:

[TestMethod]
public void ConstructionTest() {
	DerivedClass target = new DerivedClass();
 
	Assert.AreEqual(1111907664, target.Value);
}

…tests still pass, all is good.

But, now we want to refactor our derived class.  We realize that seed is really a constant and we can get rid of the value field if we use an auto property; so, we go ahead and modify BaseClass as follows:

public class DerivedClass : BaseClass {
	private const int seed = 13;
 
	public DerivedClass() {
		Value = 42;
	}
 
	public int Value { get; private set; }
 
	protected override void ChangeState() {
		Value = new Random(seed).Next();
	}
}

Looks good; but now we having a failing test: Assert.AreEqual failed. Expected:<1111907664>. Actual:<42>.

“Wait, what?” you might be thinking…

What’s happening here is that field initializers are executed before the base class constructor is called which, in turn, is called before the derived class constructor body is executed.  Since we’ve effectively changed the initialization of the “field” (now a hidden backing field for the auto-prop) we’ve switched it from a field initializer to a line in the derived constructor body: trampling all over what the base class constructor did when calling the virtual method.  Similar things happen in other OO languages; but, this particular order might be different.

Now, imagine if we didn’t have a unit test to catch this; you’d have to run the application through some set of specific scenarios to find this error.  Not so much fun.

Unfortunately, the only real solution to this is to not make virtual method calls from your base constructor.  One solution to this is to separate the invocation of ChangeState from the invocation of the constructor.  One way is basically reverting back to what we started with and adding a call to ChangeState in the same code that invokes the constructor.  Without reverting our refactoring, we can change BaseClass to what it was before and invoke the DoSomething method in the test, resulting in the following code:

public class BaseClass {
	public BaseClass() {
	}
 
	protected virtual void ChangeState() {
		// do nothing in base TODO: consider abstract
	}
 
	public void DoSomething() {
		ChangeState();
	}
}
 
public class DerivedClass : BaseClass {
	private const int seed = 13;
 
	public DerivedClass() {
		Value = 42;
	}
 
	public int Value { get; private set; }
 
	protected override void ChangeState() {
		Value = new Random(seed).Next();
	}
}
[TestMethod]
public void ChangeStateTest() {
	DerivedClass target = new DerivedClass();
 
	target.DoSomething();
	Assert.AreEqual(1111907664, target.Value);
}

Issues with virtual member invocations from a constructor are very subtle; if you’re using Code Analysis, I recommend not disabling CA2214 and promoting it to and error.  Oh, and write unit tests so you can catch these things as quickly as possible.

Software Design and First Principles–Part 0: Concepts of Object Orientation

I often compare software development with building houses or woodworking.  I sometimes even compare software development with the vocation of electrician.  In each of these other vocations, craftspeople need to go through a period of apprenticeship and mentoring before being “allowed” to practice their craft.  In each of these vocations there are a series of rules that apply to a lot of the basics of what what they do.  With building houses there are techniques and principles that are regulated by building codes; with electricians there’s techniques and fundamentals that are effectively regulated by electrical codes and standards.  It’s one thing to learn the techniques, principles, and fundamental laws of physics; but, it’s another thing to be able to call yourself an electrician or a carpenter.

Now, don’t get me wrong; I’m not advocating that software development be a licensed trade—that’s an entirely different conversation.  But, I do believe that many of the techniques and principles around software development take a lot of mentorship in order to get right.  Just like electricity, they’re not the most intuitive of techniques and principles.  But, just like electricity, it’s really good to know why you’re doing something so you can know its limits an better judge “correctness” in different scenarios.

To that effect, in order to understand many of the software development design techniques and patterns, I think the principles behind them are being ignored somewhat in a rush to get hands-on experience with certain techniques.  I think it’s important that we remember and understand what—I’m deeming—“first principles”.

A First Principle is a foundational principle about what it applies to.  Some of the principles I’m going to talk about may not all be foundational; but, I view then as almost as important as foundational, so I’m including them in First Principles.

From an object-oriented standpoint, there’s lots of principles that we can apply.  Before I get too deeply into these principles, I think it’s useful to remind ourselves what object-orientation is.  I’m not going to get too deep into OO here; I’ll assume you’ve got some experience writing and designing object-oriented programs.  But, I want to associate the principles to the OO concepts that guide them; so, It’s important you as the reader are on the same page as me.

OO really involves various concepts.  These concepts are typically outlined by: Encapsulation, abstraction, inheritance, Polymorphism (at least subtype, but usually parametric and ad-hoc as well), and “message passing”.  I’m going to ignore message passing in this part; other than to say this is typically implemented as method calls…

You don’t have to use all the OO concepts when you’re using an OO language; but, you could argue that encapsulation is one concept that is fundamental.  Encapsulation is sometimes referred to information hiding; but, I don’t think that term does it justice.  Sure, an object with private fields and methods “hides” information; but, the fact that it hides the privates of the type through a public interface of methods isn’t even alluded to in “information hiding”.  Encapsulation is, thus, a means to keep privates private and to provide a consistent public interface to act upon or access those privates.  The interface is an abstraction of the implementation details (the private data) of the class.

The next biggest part of OO is abstraction.  As we’ve seen, encapsulation is a form of abstraction (data abstraction); but the abstraction we’re focusing on now is one that decouples other implementation details.  Abstraction can be implemented with inheritance in many languages (e.g. code can know now to deal with a Shape, and not care that it’s given a Rectangle) and that inheritance can use abstract types. Some OO languages expand abstraction abilities to include things like interfaces—although you could technically do the same thing with an abstract type that had no implementation.

Inheritance is key to many of other concepts in OO—abstraction, subtype polymorphism, interfaces, etc.  (if we view an interface as an abstract type with no code, then something that “implements” an interface is really just inheriting from an abstract type; but, my focus isn’t these semantics).  We often let our zeal to model artefacts in our design and run into problems with the degree and the depth of our inheritance; a point I hope to revisit in a future post in this series.

Although you could technically use an OO language and not use polymorphism in any way, I think OO languages’ greatest features is polymorphism.  Subtype polymorphism, as I’ve noted, is a form of abstraction (Shape, Rectangle…).  But all other types of polymorphism are also abstractions—they’re replacing something concrete (implementation details) with something less concrete (abstract).  With subtype polymorphism that abstraction is an abstract type or a base type; with parametric polymorphism we generally create an algorithm abstraction that is decoupled from the data involved (Generics in .NET); and ad-hoc polymorphism is overloading—a decoupling of one particular method to one of many.

I quickly realized the scope of this topic is fairly large and that one post on the topic would be too much like drinking from a firehose as well as potentially to be protracted (and risking never getting done at all :).  So, I’ve split up what I wanted to talk about into chunks.  I’m not entirely sure what the scope actually is yet; I’ll kind of figure that out as a I go or let feedback guide me.  Now that we’ve got most of the OO concepts in our head, the next post will begin detailing the principles I wanted to talk about.

(function() {
var po = document.createElement(‘script’); po.type = ‘text/javascript’; po.async = true;
po.src = ‘https://apis.google.com/js/plusone.js’;
var s = document.getElementsByTagName(‘script’)[0]; s.parentNode.insertBefore(po, s);
})();

The Flawed Eventually-upgrade Software Model

I think Windows XP was the first real release of Windows–it had finally gotten to a usability and stability point that people could accept.  The Microsoft support model changed shortly after Windows XP was released to basically support any piece of software for as long as ten years (if you paid extra for support roughly 2 years after a successive version was released). To paraphrase a famous law: software becomes obsolete every 18 months.  That was true for a long time; but hardware and software isn’t improving at that rate any more.   Software has basically caught up with existing hardware design and now has the capability of sustaining itself, without upgrade, for much longer than it did 10 years ago.

To paraphrase once again: you can make some of the people happier all of the time, but you can’t make all of the people happier all of the time.  Releasing new versions of software now-a-days is more about attempting to make more people happier than were happier before.  To approach your solution or your technology from a 100% buy-in point of view is unrealistic.  I think we’ve seen the fallout of that model for at least the last 10 years.  People have said that successors to software like Windows XP, on their own, aren’t enough to make people happier than they already are.  To try to force a change is only coming back with push-back.  The friction that once kept people on a particular brand of OS or even particular architecture is gone–people are exercising their options if they’re unable to use what they’re happy with.

I think it’s time for software companies to change their model so customers can buy into an indefinite support model for software.  I think businesses are more than willing to spend more money to get support for some software packages longer than to buy the latest version every x number of years.  If you look at the TCO of upgrading away from XP compared to what a business pays Microsoft for the OS, it’s very much more. Companies are willing to offset that cost and buy support for XP rather than upgrade away from XP.  It just so happens that Microsoft extended support for XP rather than change their core model.

I think a the current model effectively giving customers the choice between abandoning XP and going to the latest version of an operating system (because you’re effectively forcing them to make that evaluation) the more likely that you end up forcing people away from Windows entirely.  People and businesses are re-evaluating whey they need their computers and thus the operating system installed on it.  There’s much more a need to consume data over the Internet than there was 10 years ago.  People and companies are recognizing that and they’re also recognizing there are many more options for doing just that.

With this model, moving forward, innovation will drive software sales more than they do now.  People will upgrade not because it’s the latest version and not because they have to upgrade their hardware; but because the innovation of the software is pervasive enough to justify upgrading.  Different wouldn’t be enough to sell upgrades.

What do you think?  Do you think the eventually-upgrade software model is out of date?

(function() {
var po = document.createElement(‘script’); po.type = ‘text/javascript’; po.async = true;
po.src = ‘https://apis.google.com/js/plusone.js’;
var s = document.getElementsByTagName(‘script’)[0]; s.parentNode.insertBefore(po, s);
})();

The Rat-hole of Object-oriented Mapping

Mark Seemann recently had a great post that, as most of his posts seem to do, challenges the average reader to re-think their reality a bit.  The post is titled “Is Layering Worth the Mapping”.  In the post Mark essentially details some of the grief and friction that developers face when they need to decouple two or more concepts by way of an abstraction.

Mark takes the example of layering.  Layering is a one way coupling between two “layers” where the “higher-level” layer takes a dependency on abstractions in a “lower-level” layer.  Part of his example is a UI layer communicates with a domain layer about musical track information.  That track information that is communicated lives in a hand-crafted Track abstraction.  Typically this abstraction would live with the lower-level layer to maintain the unidirectional coupling.  Of course the UI layer needs a Track concretion for it to do its job and must map between the higher-level layer and the lower-level layer.  To further complicate things other decoupling may occur within each layer to manage dependencies.  The UI may implement an MVx pattern in which case there may be a specific “view-model” track abstraction, the data layer may employ object-relational mapping, etc. etc.

Mark goes on to describe some “solutions” that often fall out of scenarios like this in a need to help manage the sheer magnitude of the classes involved: shared DTOs as cross-cutting entities, POCO classes, classes with only automatic properties, etc.

It’s not just layering.  Layering lives in this grey area between in-memory modules and out-of-process “tiers”.  Layering, I think, is an attempt to get the benefits of out-of-process decoupling without the infrastructure concerns of connecting and communicating between out-of-process processes. Of course, over and above the module/assembly separation, the only thing enforcing this decoupling in layers is skill and being methodical.

I’m convinced layering is often, or often becomes, a “speculative generality” to give some “future proofing” to the application since layering so closely resembles “tiering” (not to be confused with the eventual response of “tearing”) as to make it easy to make it tiered should there ever be a need for it.  To be clear, this is the wrong impetus to design a software solution.  You’re effectively setting yourself up to fail by essentially “making up” requirements that are more than likely going to be wrong.  If the requirements for the design are based on fallacies, they too are designed wrong.  But, you have to continue to maintain this design until you re-write it (ever noticed that anti-pattern?).

But, implementing tiers or any sort of communication between processes often ends up in the same state.  You have internal “domain” entities within the processes (and even within logical boundaries within those processes) that end up spawning the need for “DTO” objects that live at the seams on one or either side of the communication.  Further that, many times that communication is facilitated by frameworks like WCF that create their own DTOs (SOAP envelopes for example).  Except you’re mandated by the physical boundaries of processes and you’re forced to do things like shared-type assemblies to model the cross-cutting “entities” (if you choose that cross-cutting “optimization”) introducing a whole new level of effort and a massive surface area for attracting human error (you’ve technically introduced the need for versioning, potentially serialization, deployment issues, etc. etc.).

Creating an object-oriented type to simply act as a one-way container to something that lives on the other size of some logical or physical boundary has appeared to me to be a smell for quite a while.  e.g. the UI layer in Mark’s original example has this concept of a “Track” DTO-like type that when used is only used in one direction at a time.  When moving from the UI to the domain layer it’s only written to.  If it gets a “track” back from the domain layer the UI layer only reads from it.  Abstracting this into an OO class seems pointless and, as Mark says, not particularly beneficial.

Let’s look specifically at the in-memory representation of something like a “Track”.  We’ll limit our self and say that we need four Track abstractions: one for the view model, one for the domain layer abstraction, one for the data layer abstraction, and one for the object-relational mapping.  (I’ve assumed that the data layer may not have a track “entity” and is only responsible for pushing data around).  So, in effect we have four Track DTO classes in our system (and two or three Track “entities”).  But, if we look at the in-memory representation of instances of these objects they’re effectively identical—each one can’t really have more data than another otherwise there’s something wrong.  If we look at what’s actually happing with this data, we’re really writing a lot of code to copy memory around in a really inefficient way.  The DTO classes in part become the way to copy memory.  To be fair, this is a side-effect of the fact we’re manually mapping from one abstraction to another or from one abstraction to an entity (or vice-versa).

This type of thing isn’t entirely unknown; it sometimes goes by the name of ceremony.

For the most part, I think computer languages are hindering us in our ability to address this.  Languages in general tend to maintain this specific way of messaging called method-calling that limits our ability to communicate only information that can be encapsulated by the language’s or platform’s type-system.  But, to a certain extent we’re also hindered by our myopia of “everything must be a type in language X”.  Maybe this is another manifestation of Maslow’s Hammer.

Imagine if you removed all the mapping code in a complex system—especially a distributed system—and were left with just the “domain” code.  I’ve done this with one system and I was astounded that over 75% of the code in the system had nothing to do with the systems “domain” (the “value-add”) and was “ceremony” to facilitate data mapping.

I sometimes hear this isn’t so much of a problem with specific frameworks.  I’m often told that these frameworks do all the heavy lifting like this for us.  But, they really don’t.  The frameworks really just introduce another seam.  The issue of Impedance Mismatch isn’t just related to object-relational mapping.  I has to do with any mapping where both sides aren’t constrained by the same rules.  I’ve blogged about this before. but I can use some “data framework” to generate “entities” based on a data model or even based on POCO’s.  Some view this as solving the problem; but it doesn’t.  Each side operates under different rules.  The generated classes can only have as much impedance as what it has to communicate with, and you have to plan that that’s different than the impedance you’ll end up mapping from/to.  The only real solution to this is to introduce another DTO to map between your domain and the classes generated by the framework so you are decoupled from the eventual “gotchas” where your domain has different expectations or rules than the framework you’re communicating with.  When people don’t do this, you see all sorts of complains like “date/time in X isn’t the same as what I need”, etc.

Don’t fall into this rut.  Think about what you’re doing; if you’re got 4 DTOs to hold the same data; maybe there’s a better way of doing it.  Try to come up with something better and blog about it or at least talk about the problem out in the open like Mark.