Misapplied "Premature Optimization"

First of all, let’s get some definitions out of the way.


Optimisation
“…the process of modifying a system to make some aspect of it work more efficiently or use fewer resources.” 1


Premature Optimisation
“Optimization on the basis of insufficient information.” 2


In other words, premature optimisation is making a modification to some aspect of a system without sufficient information to gauge whether it will make the system more efficient or use fewer resources.


The key point I’m emphasising is “modification”.  Other’s have blogged about this in the past; but I still see far too often that “Premature Optimisation” is used as a crutch for programmers and designers to avoid having to think during the design and development phases.


What premature optimisation is not:


  • Choosing one design over another because of speed.
  • Changing a design to use fewer resources.
  • Researching the fastest algorithm to implement a new feature.
  • Analysing alternative algorithms to collect resource usage metrics during the analysis phase.
  • Changing code that does not work.

Premature optimisation is making changes to existing, working, code either without knowing it will make the code faster or use less resources or not having a requirement to make the code faster or use less resources.


In other words, if the code works as required, don’t change it.


I often see questions like “is using method one more efficient than method two?” with answers like “if one is more efficient than the other its only a matter of several hundred milliseconds; picking on over the other is a premature optimisation.”  No, it is not a premature optimisation.  You should always know what performance/resource footprint you need to fulfill when designing and writing code.  If you have no specific requirement, simply pick the fastest.  But, you have to know what is the fastest of a set of alternatives.  Asking/testing what is faster than other is not premature optimisation.  It’s just proper design and implementation.


Sometimes you must compromise performance and resource usage to maintainability or compatibility.  But, making concessions like this are also not premature optimisations.  Making design choices should always include all aspects of a system: maintainability (readability, coupling, cohesion, etc.), regulatory compliance, performance, resource usage, requirements, etc.  Choice of an algorithm or a faster or less resource intensive method of performing some action should always consider these aspects.


1 http://en.wikipedia.org/wiki/Optimization_%28computer_science%29
2 http://en.wikipedia.org/wiki/Anti-pattern


 

Using Exceptions For Normal Logic Flow

The generally accepted wisdom is that you shouldn’t use Exceptions for normal logic flow.  Normal logic flow is a bit subjective; but anything that must happen at least once in all known scenarios is normal logic flow.

Enter XML Serialization in the framework.  The framework actually dynamically creates types that perform the actual serialization of a given type and caches that assembly.  The next time that type needs to be serialized it reuses that generated type, reflection is minimized and things happen pretty quickly.

Here’s the rub.  The framework decides that it must generate the type when Assembly.Load generates an exception.  Something like:

        Assembly assembly = null;

        try

        {

            assembly = Assembly.Load(assemblyName);

        }

        catch (Exception)

        {

            assembly = GenerateTempAssembly();

        }

Which means exceptions are being used for normal logic flow, because when the code is first run the exception is thrown. Now, there’s some folks out there (including folks at Microsoft) that will say “what’s wrong with that, it gets the job done doesn’t it?”.  Yes, it does get the job done.  The problem is there’s at least once that Assembly.Load is going to generate an exception (if you modify the type being serialized it will happen again).  Again, some may say, “So what?”.  Well, in Visual Studio there is a BindingFailure Managed Debugging Assistant (MDA) that sits there looking for Assembly.Load failures.  This will get kicked off and force a break the first time you try to serialize your type and you’re debugging.  When you’re using XmlSerializer to serialize many different types and you turn on this MDA (after all, it’s there and does do some good) then you’ll spend a whole bunch of time wading through the BindingFailure MDA.

This has been occurring since XmlSerializer was created.  I say “including folks at Microsoft”, because up until a couple of days ago I thought someone was on this problem (although I can’t find my sources anymore) at Microsoft.  But, the problem still occurs in Ocras Beta 2; which means in the past 3-4 years no one has done a thing about this problem.  So, I logged a bug about it on Connect (https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=304095) and it was quickly closed as “By Design”.

Keep in mind, the “fix” were talking about is simply checking to see if a file exists before trying to load it, something like:

        Assembly assembly;

        try

        {

            Debug.Assert(assemblyName.CodeBase.StartsWith(“file:///”, true, System.Globalization.CultureInfo.InvariantCulture));

 

            String path = assemblyName.CodeBase.Substring(8).Replace(‘/’, ‘\\’);

            if (!System.IO.File.Exists(path))

            {

                assembly = GenerateTempAssembly();

            }

            else

            {

                assembly = Assembly.Load(assemblyName);

            }

        }

        catch (Exception)

        {

            assembly = GenerateTempAssembly();

        }


…not too tricky…

Who’s Referencing Whom?

When developing any sort of application, debugging in inevitable.  Sometimes, part of that debugging means trying to figure out why objects haven’t been collected and therefore figuring out what object is referencing the object that has yet to be collected.


There’s many reasons why you’d want to find out what object is referencing, like suspected memory “leaks”.


With Visual Studio (and MDbg) you can use a tool called SOS (or Son Of Strike).  This is included in the .NET installation.  To use SOS you first need to enable unmanaged debugging in your project (Project\Properties, Debug tab, check “Enable unmanaged code debugging” in the “Enable Debuggers” section).  Once unmanaged debugging is enabled you can then debug your application.  To use SOS once debugging, you need to load the extension (every time a new debugging session is started).  Once a breakpoint has been hit, open the Immediate Window and type .load sos which should result in the following:


.load sos
extension C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll loaded


With SOS loaded, you can find out if any objects of a particular type are currently in memory with the “dumpheap -type” command, for example:


!dumpheap -type NamespaceName.TypeName
 Address       MT     Size
012b2db8 009159c4      328    
total 1 objects
Statistics:
      MT    Count    TotalSize Class Name
009159c4        1          328 NamespaceName.TypeName
Total 1 objects


This lists all objects of the requested type, their address, their MethodTable (MT), and the count of each object per MethodTable.


Once you have the object’s address you can then find out what objects are referencing that particular instance.  This is done with the gcroot command:


!gcroot 012b2db8
Note: Roots found on stacks may be false positives. Run “!help gcroot” for
more info.
Error during command: warning! Extension is using a feature which Visual Studio does not implement.
 
Scan Thread 4756 OSTHread 1294
ESP:12f0dc:Root:012b2db8(WindowsApplication1.Form1)->
012bb104(WindowsApplication1.Form2)->
012bb254(System.Collections.Generic.List`1[[NamespaceName.TypeName, WindowsApplication1]])->
012bc178(System.Object[])->
012bc16c(NamespaceName.TypeName)
Scan Thread 3496 OSTHread da8


In this particular example, the above tells us that our object (012bc16c(NamespaceName.TypeName)) is referenced by an Object array (012bc178(System.Object[])), which is referenced by a List<T> object (012bb254(System.Collections.Generic.List`1[[NamespaceName.TypeName, WindowsApplication1]])), which is referenced by Form2 (012bb104(WindowsApplication1.Form2)), which is referenced by a Form1 object(ESP:12f0dc:Root:012b2db8(WindowsApplication1.Form1)).