Category Archives: C#

Micro-optimization: the surprising inefficiency of readonly fields

Introduction

Recently I’ve been optimizing the heck out of Noda Time. Most of the time this has been a case of the normal measurement, find bottlenecks, carefully analyse them, lather, rinse, repeat. Yesterday I had a hunch about a particular cost, and decided to experiment… leading to a surprising optimization.

Noda Time’s core types are mostly value types – date/time values are naturally value types, just as DateTime and DateTimeOffset are in the BCL. Noda Time’s types are a bit bigger than most value types, however – the largest being ZonedDateTime, weighing in at 40 bytes in an x64 CLR at the moment. (I can shrink it down to 32 bytes with a bit of messing around, although it’s not terribly pleasant to do so.) The main reason for the bulk is that we have two reference types involved (the time zone and the calendar system), and in Noda Time 2.0 we’re going to have nanosecond resolution instead of tick resolution (so we need 12 bytes just to store a point in time). While this goes against the Class Library Design Guidelines, it would be odd for the smaller types (LocalDate, LocalTime) to be value types and the larger ones to be reference types. Overall, these still feel like value types.

A lot of these value types are logically composed of each other:

  • A LocalDate is a YearMonthDay and a CalendarSystem reference
  • A LocalDateTime is a LocalDate and a LocalTime
  • An OffsetDateTime is a LocalDateTime and an Offset
  • A ZonedDateTime is an OffsetDateTime and a DateTimeZone reference

This leads to a lot of delegation, potentially – asking a ZonedDateTime for its Year could mean asking the OffsetDateTime, which would ask the LocalDateTime, which would ask the LocalDate, which would ask the YearMonthDay. Very nice from a code reuse point of view, but potentially inefficient due to copying data.

Why would there be data copying involved? Well, that’s where this blog post comes in.

Behaviour of value type member invocations

When an instance member (method or property) belonging to a value type is invoked, the exact behaviour depends on the kind of expression it is called on. From the C# 5 spec, section 7.5.5 (where E is the expression the member M is invoked on, and the type declaring M is a value type):

If E is not classified as a variable, then a temporary local variable of E’s type is created and the value of E is assigned to that variable. E is then reclassified as a reference to that temporary local variable. The temporary variable is accessible as this within M, but not in any other way. Thus, only when E is a true variable is it possible for the caller to observe the changes that M makes to this.

So when is a variable not a variable? When it’s readonly… from section 7.6.4 (emphasis mine) :

If T is a struct-type and I identifies an instance field of that class-type:

  • If E is a value, or if the field is readonly and the reference occurs outside an instance constructor of the struct in which the field is declared, then the result is a value, namely the value of the field I in the struct instance given by E.

(There’s a very similar bullet for T being a class-type; the important part is that the field type is a value type

The upshot is that if you have a method call of:

int result = someField.Foo();

then it’s effectively converted into this:

var tmp = someField;
int result = tmp.Foo();

Now if the type of the field is quite a large value type, but Foo() doesn’t modify the value (which it never does within my value types), that’s performing a copy completely unnecessarily.

To see this in action outside Noda Time, I’ve built a little sample app.

Show me the code!

Our example is a simple 256-bit type, composed of 4 Int64 values. The type itself doesn’t do anything useful – it just holds the four values, and exposes them via properties. We then measure how long it takes to sum the four properties lots of times.

using System;
using System.Diagnostics;

public struct Int256
{
    private readonly long bits0;
    private readonly long bits1;
    private readonly long bits2;
    private readonly long bits3;
    
    public Int256(long bits0, long bits1, long bits2, long bits3)
    {
        this.bits0 = bits0;
        this.bits1 = bits1;
        this.bits2 = bits2;
        this.bits3 = bits3;
    }
    
    public long Bits0 { get { return bits0; } }
    public long Bits1 { get { return bits1; } }
    public long Bits2 { get { return bits2; } }
    public long Bits3 { get { return bits3; } }
}

class Test
{
    private readonly Int256 value;

    public Test()
    {
        value = new Int256(1L, 5L, 10L, 100L);
    }
    
    public long TotalValue 
    { 
        get 
        {
            return value.Bits0 + value.Bits1 + value.Bits2 + value.Bits3; 
        }
    }
    
    public void RunTest()
    {
        // Just make sure it’s JITted…
        var sample = TotalValue;
        Stopwatch sw = Stopwatch.StartNew();
        long total = 0;
        for (int i = 0; i < 1000000000; i++)
        {
            total += TotalValue;
        }
        sw.Stop();
        Console.WriteLine("Total time: {0}ms", sw.ElapsedMilliseconds);
    }
    
    static void Main()
    {
        new Test().RunTest();
    }
}

Building this from the command line with /o+ /debug- and running (in a 64-bit CLR, but no RyuJIT) this takes about 20 seconds to run on my laptop. We can make it much faster with just one small change:

class Test
{
    private Int256 value;

    // Code as before
}

The same test now takes about 4 seconds – a 5-fold speed improvement, just by making a field non-readonly. If we look at the IL for the TotalValue property, the copying becomes obvious. Here it is when the field is readonly:

.method public hidebysig specialname instance int64 
        get_TotalValue() cil managed
{
  // Code size       60 (0x3c)
  .maxstack  2
  .locals init (valuetype Int256 V_0,
           valuetype Int256 V_1,
           valuetype Int256 V_2,
           valuetype Int256 V_3)
  IL_0000:  ldarg.0
  IL_0001:  ldfld      valuetype Int256 Test::’value’
  IL_0006:  stloc.0
  IL_0007:  ldloca.s   V_0
  IL_0009:  call       instance int64 Int256::get_Bits0()
  IL_000e:  ldarg.0
  IL_000f:  ldfld      valuetype Int256 Test::’value’
  IL_0014:  stloc.1
  IL_0015:  ldloca.s   V_1
  IL_0017:  call       instance int64 Int256::get_Bits1()
  IL_001c:  add
  IL_001d:  ldarg.0
  IL_001e:  ldfld      valuetype Int256 Test::’value’
  IL_0023:  stloc.2
  IL_0024:  ldloca.s   V_2
  IL_0026:  call       instance int64 Int256::get_Bits2()
  IL_002b:  add
  IL_002c:  ldarg.0
  IL_002d:  ldfld      valuetype Int256 Test::’value’
  IL_0032:  stloc.3
  IL_0033:  ldloca.s   V_3
  IL_0035:  call       instance int64 Int256::get_Bits3()
  IL_003a:  add
  IL_003b:  ret
} // end of method Test::get_TotalValue

And here it is when the field’s not readonly:

.method public hidebysig specialname instance int64 
        get_TotalValue() cil managed
{
  // Code size       48 (0x30)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldflda     valuetype Int256 Test::’value’
  IL_0006:  call       instance int64 Int256::get_Bits0()
  IL_000b:  ldarg.0
  IL_000c:  ldflda     valuetype Int256 Test::’value’
  IL_0011:  call       instance int64 Int256::get_Bits1()
  IL_0016:  add
  IL_0017:  ldarg.0
  IL_0018:  ldflda     valuetype Int256 Test::’value’
  IL_001d:  call       instance int64 Int256::get_Bits2()
  IL_0022:  add
  IL_0023:  ldarg.0
  IL_0024:  ldflda     valuetype Int256 Test::’value’
  IL_0029:  call       instance int64 Int256::get_Bits3()
  IL_002e:  add
  IL_002f:  ret
} // end of method Test::get_TotalValue

Note that it’s still loading the field address (ldflda) four times. You might expect that copying the field onto the stack once via a temporary variable would be faster, but that ends up at about 6.5 seconds on my machine.

There is an optimization which is even faster – moving the totalling property into Int256. That way (with the non-readonly field, still) the total time is less than a second – twenty times faster than the original code!

Conclusion

This isn’t an optimization I’d recommend in general. Most code really doesn’t need to be micro-optimized this hard, and most code doesn’t deal with large value types like the ones in Noda Time. However, I regard Noda Time as a sort of "system level" library, and I don’t ever want someone to decide not to use it on  performance grounds. My benchmarks show that for potentially-frequently-called operations (such as the properties on ZonedDateTime) it really does make a difference, so I’m going to go for it.

I intend to apply a custom attribute to each of these "would normally be readonly" fields to document the intended behaviour of the field – and then when Roslyn is fully released, I’ll probably write a test to validate that all of these fields would still compile if the field were made readonly (e.g. that they’re never assigned to outside the constructor).

Aside from anything else, I find the subtle difference in behaviour between a readonly field and a read/write field fascinating… it’s something I’d been vaguely aware of in the past, but this is the first time that it’s had a practical impact on me. Maybe it’ll never make any difference to your code… but it’s probably worth being aware of anyway.

Diagnosing Portable Class Libraries

I’ve always found Portable Class Library (PCL) configuration to be a bit of a mystery. In simple cases, it’s simple: start a new PCL project in Visual Studio, select the environments you want to support, and away you go. But what’s going on under the hood, and what do all the options mean? How do I do this without Visual Studio?

Background: supporting Universal Windows Applications in Noda Time

Recently, I had a feature request for Noda Time to support Windows Phone 8.1 and Universal Windows Applications. This is something I haven’t done before, but it doesn’t sound too hard – it’s just a matter of changing an option in Visual Studio, surely… Unfortunately it’s not that simple for Noda Time. In order to support .NET 3.5 (and some other features which aren’t available in PCLs) the core NodaTime.dll project has multiple configurations which effectively split into "Desktop" and "PCL". This has been achieved by hacking around with the .csproj file manually – nothing particularly clever, but the sort of thing which can’t be done directly in Visual Studio. This means that when I open the project properties in Visual Studio – even when I’m building a PCL version – it treats the project as a regular class library. (It confuses the heck out of ReSharper too in some cases, unfortunately.)

That’s not too hard to work around – I just need to know the name of the profile I want to support. A profile name is basically a single identifier which refers to the configuration of environments you want to support. Unfortunately, it’s pretty opaque – Noda Time 1.2 ships with "Profile2" which means ".NET 4.0, Windows 8 (store apps), Silverlight 4, Windows Phone 7. The simplest way to work out what profile I wanted was to create a new "regular" PCL in Visual Studio, unload the project, manually set the profile to Profile2 in the project file, reload the project, and then enable "Windows Phone 8.1".

That then led to the next problem: Visual Studio 2013 (update 2) wanted to upgrade that sample project… because Profile2 is no longer really an option. You can’t manually set that combination in Visual Studio any more – it doesn’t display anything earlier than Silverlight 5 and Windows Phone 8. I now have two problems:

  • Given that Visual Studio changes the project file for me with a warning when I open it, how can I check what I’ve actually built against? Does msbuild silently do the same thing?
  • How can I find out what profiles are available, in order to select the most appropriate one?

Time to do some research…

Detecting a profile in a built assembly

First things first: if someone gives you a DLL, how can you detect what profile it was built against, and therefore which environments it can run in?

After hunting around in vain for a while, Petr K helped me out on Twitter: the crucial information is in the TargetFrameworkAttribute and its FrameworkName property, which is something like ".NETPortable,Version=v4.0,Profile=Profile2". That can then be used with the FrameworkName class to parse out the profile name without writing any string manipulation code.

Then you just need to find the details of the profile…

Where are all the profiles?

Windows stores the profiles along with the reference assembles, in the .NETPortable framework directory – on my machine for example, that’s "C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETPortable". There are then subdirectories v4.0, v4.5 and v4.6 – although it’s not 100% clear what those refer to. Within each of those, there’s a Profile directory, which contains a bunch of Profile[n] directories. Then under each Profile[n] directory, a SupportedFrameworks directory contains XML documents describing the supported frameworks for that profile. Phew! So as an example:

C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETPortable\v4.0\Profile\Profile2

contains the profile information for Profile2, including files:

  • C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETPortable\v4.0\Profile\Profile2\SupportedFrameworks\.NET Framework 4.xml
  • C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETPortable\v4.0\Profile\Profile2\SupportedFrameworks\Silverlight.xml
  • C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETPortable\v4.0\Profile\Profile2\SupportedFrameworks\Windows 8.xml
  • C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETPortable\v4.0\Profile\Profile2\SupportedFrameworks\Windows Phone Silverlight 7.xml

Each of those XML files consists of a single Framework element, with a variety of attributes. Not all elements have the same set of attributes, although everything has an Identifier, Family, Profile, MinimumVersion and DisplayName as far as I’ve seen. Here’s an example – the "Windows Phone Silverlight 7.xml" file listed above:

<?xml version="1.0" encoding="utf-8"?>
<Framework
    Identifier="Silverlight"
    Profile="WindowsPhone*"
    MinimumVersion="4.0"
    Family="WindowsPhone"
    MinimumVisualStudioVersion="10.0"
    MaximumVisualStudioVersion="11.0"
    DisplayName="Windows Phone Silverlight"
    MinimumVersionDisplayName="7"
    PlatformArchitectures="AnyCPU" />

As noted by David Kean in comments, the "Identifier", "Profile" and "MinimumVersion" are mapped to the corresponding <TargetFramework*> properties in MSBuild; the combination of the three is unique for a target, but you need to be careful to look at all the relevant information… don’t just look at the Identifier attribute. Additionally, if there’s a MinimumVersionDisplayName, that’s probably the version you’ll be more familiar with – so in the example above, the displayed minimum version number is 7, which is what people think about for Windows Phone versions, even though the real minimum version number is 4.0 as far as the build is concerned.

Read David’s comment for more details – I don’t want to paraphrase it too much, in case I accidentally write something misleading.

What about the Xamarin frameworks?

As you can see above, Profile2 doesn’t include anything that looks like it’s a Xamarin framework. (Compare that with Profile136 for example, which includes Xamarin.iOS.xml and Xamarin.Android.xml.) But haven’t I previously said that Noda Time 1.2 works seamlessly in Xamarin.iOS and Xamarin.Android? Indeed I have. This is all very confusing, but as far as I’m aware, the Xamarin.* frameworks are only required if you want to build within Xamarin Studio and you’re not on a Windows machine. I could be wrong about that, but certainly there are some profiles which will work with some versions of Xamarin without a Xamarin-specific framework being on the list.

I would welcome more clarity on this, and if I learn any more about it (either in comments here or via other channels) I’ll update this post.

Where should the files go in NuGet packages?

The next issue to consider is that of NuGet packages. A NuGet package can contain multiple binaries for multiple target profiles – for example, the NodaTime NuGet package includes both the desktop and PCL versions. NuGet adds a reference to the right one when you install a package. How can it tell which binary is which? Through a directory structure. Here’s a piece of Noda Time’s nuspec package:

<files>
  <file src="bin\Signed Release\NodaTime.dll" target="lib\net35-Client" />
  <file src="bin\Signed Release\NodaTime.pdb" target="lib\net35-Client" />
  <file src="bin\Signed Release\NodaTime.xml" target="lib\net35-Client" />
  <file src="bin\Signed Release Portable\NodaTime.dll" target="lib\portable-win+net40+sl4+wp7" />
  <file src="bin\Signed Release Portable\NodaTime.pdb" target="lib\portable-win+net40+sl4+wp7" />
  <file src="bin\Signed Release Portable\NodaTime.xml" target="lib\portable-win+net40+sl4+wp7" />
  <file src="**\*.cs" target="src" />
</files>

As you can see, the PCL files end up in lib\portable-win+net40+sl4+wp7. That works for Profile2, but it’s not the right directory for the profile I’ll end up using. So how do I work out what directory to use in the next version? I can look at the NuGet source code, or this handy table of profiles and corresponding NuGet targets. (Interestingly, that lists the directory for Profile2 as portable-net4+sl4+netcore45+wp7, so I’m not entirely sure what’s going on. I suspect the table is more reliable than whenever I got the name from when preparing a Noda Time release.)

This all sounds a bit fiddly. Shouldn’t there be a tool to make it simple?

Yes, yes there should. As it happens, Stephen Cleary has already written one, although I didn’t realise this until I’d also written one myself. Indeed, the table linked above is the output from Stephen’s tool.

My own tool is called PclPal, and the source is available on GitHub. Don’t expect polished code (or tests) – this was very much a quick and dirty hack to make life slightly easier. It could be the basis of a more polished tool – possibly even a "PCL profile explorer" or something similar. Oh, and it requires the Roslyn End User Preview.

Currently it’s just a command line tool with three options:

  • "list" – lists the profiles on your system
  • "dll <filename>" – shows the details of the profile a particular assembly was built against
  • "show <profile name>" – shows the details of a specify profile

Note that currently I’m not doing anything in terms of NuGet package directories, although that’s probably the next step, if I take this any further. (I may just use Stephen’s tool, of course…)

Conclusion

The world of PCLs is a bit of a mess, to say the least. It’s possible that there’s lots of really clear, up-to-date documentation explaining everything somewhere – but I have yet to find it. I suspect the main problem is that this is a moving target in all kinds of ways. Versioning is never easy, and between all the different variations possible here, I’m not entirely surprised it’s confusing.

Hopefully this post will help to shed a bit more light on things – please feel very free to leave corrections, as I definitely don’t want to propagate any falsehoods. I’ve definitely learned a thing or two in the last few days, and with tools such as PclPal and Stephen’s PortableLibraryProfiles, life should be that little bit easier…

Extension methods, explicitly implemented interfaces and collection initializers

This post is the answer to yesterday’s brainteaser. As a reminder, I was asking what purpose this code might have:

public static class Extensions 

    public static void Add<T>(this ICollection<T> source, T item) 
    { 
        source.Add(item); 
    } 
}

There are plenty of answers, varying from completely incorrect (sorry!) to pretty much spot on.

As many people noticed, ICollection<T> already has an Add method taking an item of type T. So what difference could this make? Well, consider LinkedList<T>, which implements ICollection<T>, used as below:

// Invalid
LinkedList<int> list = new LinkedList<int>();
list.Add(10);

That’s not valid code (normally)…. whereas this is:

// Valid
ICollection<int> list = new LinkedList<int>();
list.Add(10);

The only difference is the compile-time type of the list variable – and that changes things because LinkedList<T> implements ICollection<T>.Add using explicit interface implementation. (Basically you’re encouraged to use AddFirst and AddLast instead, to make it clear which end you’re adding to. Add is equivalent to AddLast.)

Now consider the invalid code above, but with the brainteaser extension method in place. Now it’s a perfectly valid call to the extension method, which happens to delegate straight to the ICollection<T> implementation. Great! But why bother? Surely we can just cast list if we really want to:

LinkedList<int> list = new LinkedList<int>();
((IList<int>)list).Add(10);

That’s ugly (really ugly) – but it does work. But what about situations where you can’t cast? They’re pretty rare, but they do exist. Case in point: collection initializers. This is where the C# 6 connection comes in. As of C# 6 (at least so far…) collection initializers have changed so that an appropriate Add extension method is also permitted. So for example:

// Sometimes valid :)
LinkedList<int> list = new LinkedList<int> { 10, 20 };

That’s invalid in C# 5 whatever you do, and it’s only valid in C# 6 when you’ve got a suitable extension method in place, such as the one in yesterday’s post. There’s nothing to say the extension method has to be on ICollection<T>. While it might feel nice to be general, most implementations of ICollection<T> which use explicit interface implementation for ICollection<T>.Add do so for a very good reason. With the extension method in place, this is valid too…

// Valid from a language point of view…
ReadOnlyCollection<int> collection = new ReadOnlyCollection<int>(new[] { 10, 20 }) { 30, 40 };

That will compile, but it’s obviously not going to succeed at execution time. (It throws NotSupportedException.)

Conclusion

I don’t think I’d ever actually use the extension method I showed yesterday… but that’s not quite the same thing as it being useless, particularly when coupled with C# 6’s new-and-improved collection initializers. (The indexer functionality means you can use collection initializers with ConcurrentDictionary<,> even without extension methods, by the way.)

Explicit interface implementation is an interesting little corner to C# which is easy to forget about when you look at code – and which doesn’t play nicely with dynamic typing, as I’ve mentioned before.

And finally…

Around the same time as I posted the brainteaser yesterday, I also remarked on how unfortunate it was that StringBuilder didn’t implement IEnumerable<char>. It’s not that I really want to iterate over a StringBuilder… but if it implemented IEnumerable, I could use it with a collection initializer, having added some extension methods. This would have been wonderfully evil…

using System;
using System.Text; 

public static class Extensions  
{  
    public static void Add(this StringBuilder builder, string text)
    {  
        builder.AppendLine(text);
    }  

    public static void Add(this StringBuilder builder, string format, params object[] arguments)
    {  
        builder.AppendFormat(format, arguments);
        builder.AppendLine();
    }  

class Test
{
    static void Main()
    {
        // Sadly invalid :(
        var builder = new StringBuilder
        {
            "Just a plain message",
            { "A message with formatting, recorded at {0}", DateTime.Now }
        };
    }
}

Unfortunately it’s not to be. But watch this space – I’m sure I’ll find some nasty ways of abusing C# 6…

Quick brainteaser

Just a really quick one today…

What’s the point of this code? Does it have any point at all?

public static class Extensions
{
    public static void Add<T>(this ICollection<T> source, T item)
    {
        source.Add(item);
    }
}

Bonus marks if you can work out what made me think about it.

I suggest you ROT-13 answers to avoid spoilers for other readers.

A tale of two puzzles

As I begin to write this, I’m in a small cubicle in Philadelphia airport, on my way back from CodeMash – a wonderful conference (yet again) which I feel privileged to have attended. Personal top highlights definitely include Dustin Campbell’s talk on C# 6 (I’m practically dribbling with anticipation – bits please!) and playing Settlers of Catan on an enormous board. Kevin Pilch-Bisson’s talk on scriptcs was fabulous too, not least because he demonstrated its NuGet support using Noda Time. I’m very likely to use scriptcs as a tool to ease folks into the language gently if I ever get round to writing my introductory C# book. (To follow my own advice from the talk I gave, I should really just start writing it – whether the initial attempt is good or not, it’s better than having nothing.)

First puzzle

During Dustin’s talk, he set some puzzles around refactoring – where an "inline" operation had to be smarter than it originally appeared, for various reasons. The final puzzle particularly piqued my interest, as Dustin explained that various members of the C# team hadn’t actually worked out why the code behaved the way it did… the refactoring in Roslyn worked in that it didn’t change the behaviour, but more through a good general strategy than any attempt to handle this specific case.

So, the code in question is:

using System;

class X
{
    static int M(Func<int?, byte> x, object y) { return 1; }
    static int M(Func<X, byte> x, string y) { return 2; }

    const int Value = 1000;

    static void Main()
    {
        var a = M(X => (byte) X.Value, null);

        unchecked
        {
            Console.WriteLine(a);
            Console.WriteLine(M(X => (byte) X.Value, null));
        }
    }
}

This produces output of:

1
2

Is this correct? Is it a compiler bug from the old native compiler, faithfully reproduced in Roslyn? Why would moving the expression into an "unchecked" block cause different behaviour?

In an attempt to give you enough space and time to avoid accidentally reading the answer, which is explained below, I’d like to take the time to mention the source of this puzzle. A few years ago, Microsoft hired Vladmir Reshetnikov into testing part of the C#/VB team. Vladmir had previously worked at JetBrains on Resharper, so it’s not like he was new to C#. Every year when I see Kevin and Dustin at CodeMash, they give me more reports of the crazy things Vladmir has come up with – perversions of the language way beyond my usual fare. The stories are always highly entertaining, and I hope to meet Vladmir in person one day. He has a twitter account which I’ve only just started following, but which I suspect I’ll find immensely distracting.

Explanation

I massively overthought this for a while. I then significantly underthought it for a while. I played around with the code by:

  • Removing the second parameter from M
  • Changing the name of the parameter
  • Changing the parameter types in various ways
  • Making X.Value non-const (just static readonly)
  • Changing the value of X.Value to 100
  • Changing the lambda expression to make a call to a method using (byte) X.Value instead of returning it directly
  • Using a statement lambda

These gave me clues which I then failed to follow up on properly – partly because I was looking for something complicated. I was considering the categorization of a "cast of a constant value to a different numeric type" and similar details – instead of following through the compiler rules in a systematic way.

The expression we need to focus on is this:

M(X => (byte) X.Value, null)

This is just a method call using a lambda expression, using overload resolution to determine which overload to call and type inference to determine the type arguments to the method. In a very crude description of overload resolution, we perform the following steps:

  • Determine which overloads are applicable (i.e. which ones make sense in terms of the supplied arguments and the corresponding parameters)
  • Compare the applicable overload to each other in terms of "betterness"
  • If one applicable overload is "better" than all the others, use that (modulo a bit more final checking)
  • If there are no applicable overload, or no one applicable method is better than all the others, then the call is invalid and leads to a compile-time error

My mistake was jumping straight to the second bullet point, assuming that both overloads are valid in each case.

Overload 1 – a simple parameter

First let’s look at the first overload: the one where the first parameter is of type Func<int?, byte>. What does the lambda expression of X => (byte) X.Value mean when converted to that delegate type? Is it always valid?

The tricky part is working out what the simple-name X refers to as part of X.Value within the lambda expression. The important part of the spec here is the start of section 7.6.2 (simple names):

A simple-name is either of the form I or of the form I<A1, …, AK>, where I is a single identifier and <A1, …, AK> is an optional type-argument-list. When no type-argument-list is specified, consider K to be zero. The simple-name is evaluated and classified as follows:

  • If K is zero and the simple-name appears within a block and if the block’s (or an enclosing block’s) local variable declaration space (§3.3) contains a local variable, parameter or constant with name I, then the simple-name refers to that local variable, parameter or constant and is classified as a variable or value.

(The spec continues with other cases.) So X refers to the lambda expression parameter, which is of type int? – so X.Value refers to the underlying value of the parameter, in the normal way.

Overload 2 – a bit more complexity

What about the second overload? Here the first parameter to M is of type Func<X, byte>, so we’re trying to convert the same lambda expression to that type. Here, the same part of section 7.6.2 is used, but also section 7.6.4.1 comes into play when determining the meaning of the member access expression X.Value:

In a member access of the form E.I, if E is a single identifier, and if the meaning of E as a simple-name (§7.6.2) is a constant, field, property, local variable, or parameter with the same type as the meaning of E as a type-name (§3.8), then both possible meanings of E are permitted. The two possible meanings of E.I are never ambiguous, since I must necessarily be a member of the type E in both cases. In other words, the rule simply permits access to the static members and nested types of E where a compile-time error would otherwise have occurred.

It’s not spelled out explicitly here, but the reason that it can’t be ambiguous is that you can’t have two members of the same name within the same type (other than for method overloads). You can have method overloads that are both static and instance methods, but then normal method overloading rules are enforced.

So in this case, X.Value doesn’t involve the use of the parameter called X at all. Instead, it’s the constant called value within the class X. Note that this is only the case because the type of the parameter is the same as the type that the parameter’s name refers to. (It isn’t quite the same as the names being the same. If you have a using directive introducing an alias, such as using Y = X; then the condition can be satisfied with different names.)

So what difference does unchecked make?

Now that we know what the conversions from the lambda expression to the two different delegate types mean, we can consider whether they’re always valid. Taking the overload resolution out of the picture, when is each of these lines valid at compile-time?

Func<int?, byte> foo = X => (byte) X.Value;
Func<X, byte> bar = X => (byte) X.Value;

Note that we’re not trying to consider whether they might throw an exception – clearly if you call foo(null) then that will fail. But will it at least compile?

The first line is always valid… but the second depends on your context, in terms of checked or unchecked arithmetic. For the most part, if you’re not explicitly in a checked or unchecked context, the language assumes a default context of unchecked, but allows "external factors" (such as a compiler flag). The Microsoft compiler is unchecked by default, and you can make an assembly use checked arithmetic "globally" using the /checked compiler argument (or going to the Advanced settings in the build properties of a Visual Studio project.

However, one exception to this "default" rule is constant expressions. From section 7.6.12 of the spec:

For constant expressions (expressions that can be fully evaluated at compile-time), the default overflow checking context is always checked. Unless a constant expression is explicitly placed in an unchecked context, overflows that occur during the compile-time evaluation of the expression always cause compile-time errors.

The cast of X.Value to byte is still a constant expression – and it’s one that overflows at compile-time, because 1000 is outside the range of byte. So unless we’re explicitly in an unchecked context, the code snippet above will fail for the second line.

Back to overloading

Given all of that, how does it fit in with our problem? Well, at this point it’s reasonably simple – so long as we’re careful. Look at the first assignment, which occurs outside the unchecked statement:

var a = M(X => (byte) X.Value, null);

Which overloads are applicable here? The second argument isn’t a problem – the null literal is convertible to both object and string. So can we convert the first argument (the lambda expression) to the first parameter of each overload?

As noted earlier, it’s fine to convert it to a Func<int?, byte>. So the first method is definitely applicable. However, the second method requires us to convert the lambda expression to Func<X, byte>. We’re not in an explicitly unchecked context, so the conversion doesn’t work. Where the rule quoted above talks about "overflows that occur during the compile-time evaluation of the expression always cause compile-time errors" it doesn’t mean we actually see an error message when the compiler speculatively tries to convert the lambda expression to the Func<X, byte> – it just means that the conversion isn’t valid.

So, there’s only one applicable method, and that gets picked and executed. As a result, the value of a is 1.

Now in the second call (inside the unchecked statement) both methods are applicable: the conversion from the lambda expression to the Func<X, byte> is valid because the constant conversion occurs within an explicitly unchecked context. We end up with two applicable methods, and need to see if one of them is definitively better than the other. This comes down to section 7.5.3.2 of the C #5 spec, which gives details of a "better function member" test. This in turn ends up talking about better conversions from arguments to parameter types. The lambda expression conversions are effectively equal, but the conversion of null to string is "better than" the conversion to object, because there’s an implicit conversion from string to object but not vice versa. (In that sense the string conversion is "more specific" to borrow Java terminology.) So the second overload is picked, and we print 2 to the console.

Phew!

Summary

This all comes down to the cast from a constant to byte being valid in an explicitly unchecked context, but not otherwise.

One odd part is that I spotted that reasonably quickly, and assumed the problem was actually fairly straightforward. It’s only when I started writing it up for this blog post that I dug into the spec to check the exact rules for the meaning of simple names in different situations. I’d solved the crux of the puzzle, but the exact details were somewhat involved. Indeed, while writing this blog post I’ve started two separate emails to the C# team reporting potential (equal and opposite) bugs – before discarding those emails as I got further into the specification.

This is reasonably common, in my experience: the C# language has been well designed so that we can intuitively understand the meaning of code without knowing the details of the exact rules. Those rules can be pretty labyrinthine sometimes, because the language is very precisely specified – but most of the time you don’t need to read through the rules in detail.

Didn’t you mention two puzzles?

Yes, yes I did.

The second puzzle is much simpler to state – and much harder to solve. To quote Vladmir’s tweet verbatim:

C# quiz: write a valid C# program containing a sequence of three tokens `? null :` that remains valid after we remove the `null`.

Constructing a valid program containing "? null :" is trivial.

Constructing a valid program containing "? :" is slightly trickier, but not really hard.

Constructing a program which is valid both with and without the null token is too hard for me, at the moment. I’m stuck – and I know I’m in good company, as Kevin, Dustin and Bill Wagner were all stuck too, last time I heard. I know of two people who’ve solved this so far… but I’m determined to get it…

Casting vs "as" – embracing exceptions

(I’ve ended up commenting on this issue on Stack Overflow quite a few times, so I figured it would be worth writing a blog post to refer to in the future.)

There are lots of ways of converting values from one type to another – either changing the compile-time type but actually keeping the value the same, or actually changing the value (for example converting int to double). This post will not go into all of those – it would be enormous – just two of them, in one specific situation.

The situation we’re interested in here is where you have an expression (typically a variable) of one reference type, and you want an expression with a different compile-time type, without changing the actual value. You just want a different "view" on a reference. The two options we’ll look at are casting and using the "as" operator. So for example:

object x = "foo"
string cast = (string) x; 
string asOperator = x as string;

The major differences between these are pretty well-understood:

  • Casting is also used for other conversions (e.g. between value types); "as" is only valid for reference type expressions (although the target type can be a nullable value type)
  • Casting can invoke user-defined conversions (if they’re applicable at compile-time); "as" only ever performs a reference conversion
  • If the actual value of the expression is a non-null reference to an incompatible type, casting will throw an InvalidCastException whereas the "as" operator will result in a null value instead

The last point is the one I’m interested in for this post, because it’s a highly visible symptom of many developers’ allergic reaction to exceptions. Let’s look at a few examples.

Use case 1: Unconditional dereferencing

First, let’s suppose we have a number of buttons all with the same event handler attached to them. The event handler should just change the text of the button to "Clicked". There are two simple approaches to this:

void HandleUsingCast(object sender, EventArgs e) 

    Button button = (Button) sender; 
    button.Text = "Clicked"
}

void HandleUsingAs(object sender, EventArgs e) 

    Button button = sender as Button; 
    button.Text = "Clicked"
}

(Obviously these aren’t the method names I’d use in real code – but they’re handy for differentiating between the two approaches within this post.)

In both cases, when the value of "sender" genuinely refers to a Button instance, the code will function correctly. Likewise when the value of "sender" is null, both will fail with a NullReferenceException on the second line. However, when the value of "sender" is a reference to an instance which isn’t compatible with Button, the two behave differently:

  • HandleUsingCast will fail on the first line, throwing a InvalidCastException which includes information about the actual type of the object
  • HandleUsingAs will fail on the second line with a NullReferenceException

Which of these is the more useful behaviour? It seems pretty unambiguous to me that the HandleUsingCast option provides significantly more information, but still I see the code from HandleUsingAs in examples on Stack Overflow… sometimes with the rationale of "I prefer to use as instead of a cast to avoid exceptions." There’s going to be an exception anyway, so there’s really no good reason to use "as" here.

Use case 2: Silently ignoring invalid input

Sometimes a slight change is proposed to the above code, checking for the result of the "as" operator not being null:

void HandleUsingAs(object sender, EventArgs e) 

    Button button = sender as Button; 
    if (button != null
    { 
        button.Text = "Clicked"
    } 
}

Now we really don’t have an exception. We can do this with the cast as well, using the "is" operator:

void HandleUsingCast(object sender, EventArgs e) 

    if (sender is Button) 
    { 
        Button button = (Button) sender; 
        button.Text = "Clicked"
    } 
}

These two methods now behave the same way, but here I genuinely prefer the "as" approach. Aside from anything else, it’s only performing a single execution-time type check, rather than checking once with "is" and then once again with the cast. There are potential performance implications here, but in most cases they’d be insignificant – it’s the principle of the thing that really bothers me. Indeed, this is basically the situation that the "as" operator was designed for. But is it an appropriate design to start with?

In this particular case, it’s very unlikely that we’ll have a non-Button sender unless there’s been a coding error somewhere. For example, it’s unlikely that bad user input or network resource issues would lead to entering this method with a sender of (say) a TextBox. So do you really want to silently ignore the problem? My personal view is that the response to a detected coding error should almost always be an exception which either goes completely uncaught or which is caught at some "top level", abandoning the current operation as cleanly as possible. (For example, in a client application it may well be appropriate to terminate the app; in a web application we wouldn’t want to terminate the server process, but we’d want to abort the processing of the problematic request.) Fundamentally, the world is not in a state which we’ve really considered: continuing regardless could easily make things worse, potentially losing or corrupting data.

If you are going to ignore a requested operation involving an unexpected type, at least clearly log it – and then ensure that any such logs are monitored appropriately:

void HandleUsingAs(object sender, EventArgs e) 

    Button button = sender as Button; 
    if (button != null
    { 
        button.Text = "Clicked"
    } 
    else 
    { 
        // Log an error, potentially differentiating between
        // a null sender and input of a non-Button sender.
    } 
}

Use case 3: consciously handling input of an unhelpful type

Despite the general thrust of the previous use case, there certainly are cases where it makes perfect sense to use "as" to handle input of a type other than the one you’re really hoping for. The simplest example of this is probably equality testing:

public sealed class Foo : IEquatable<Foo> 

    // Fields, of course

    public override bool Equals(object obj) 
    { 
        return Equals(obj as Foo); 
    } 

    public bool Equals(Foo other) 
    { 
        // Note: use ReferenceEquals if you overload the == operator
        if (other == null
        { 
            return false
        } 
        // Now compare the fields of this and other for equality appropriately.
    } 

    // GetHashCode etc
}

(I’ve deliberately sealed Foo to avoid having to worry about equality between subclasses.)

Here we know that we really do want to deal with both null and "non-null, non-Foo" references in the same way from Equals(object) – we want to return false. The simplest way of handling that is to delegate to the Equals(Foo) method which needs to handle nullity but doesn’t need to worry about non-Foo reference.

We’re knowingly anticipating the possibility of Equals(object) being called with a non-Foo reference. The documentation for the method explicitly states what we’re meant to do; this does not necessarily indicate a programming error. We could implement Equals with a cast, of course:

public override bool Equals(object obj) 

    return obj is Foo && Equals((Foo) obj); 
}

… but I dislike that for the same reasons as I disliked the cast in use case 2.

Use case 4: deferring or delegating the decision

This is the case where we pass the converted value on to another method or constructor, which is likely to store the value for later use. For example:

public Person CreatePersonWithCast(object firstName, object lastName) 

    return new Person((string) firstName, (string) lastName); 
}

public Person CreatePersonWithAs(object firstName, object lastName) 

    return new Person(firstName as string, lastName as string); 
}

In some ways use case 3 was a special case of this, where we knew what the Equals(Foo) method would do with a null reference. In general, however, there can be a significant delay between the conversion and some definite impact. It may be valid to use null for one or both arguments to the Person constructor – but is that really what you want to achieve? Is some later piece of code going to assume they’re non-null?

If the constructor is going to validate its parameters and check they’re non-null, we’re essentially back to use case 1, just with ArgumentNullException replacing NullReferenceException: again, it’s cleaner to use the cast and end up with InvalidCastException before we have the chance for anything else to go wrong.

In the worst scenario, it’s really not expected that the caller will pass null arguments to the Person constructor, but due to sloppy validation the Person is constructed with no errors. The code may then proceed to do any number of things (some of them irreversible) before the problem is spotted. In this case, there may be lasting data loss or corruption and if an exception is eventually thrown, it may be very hard to trace the problem to the original CreatePersonWithAs parameter value not being a string reference.

Use case 5: taking advantage of "optional" functionality

This was suggested by Stephen Cleary in comments, and is an interesting reversal of use case 3. The idea is basically that if you know an input implements a particular interface, you can take a different – usually optimized – route to implement your desired behaviour. LINQ to Objects does this a lot, taking advantage of the fact that while IEnumerable<T> itself doesn’t provide much functionality, many collections implement other interfaces such as ICollection<T>. So the implementation of Count() might include something like this:

ICollection<T> collection = source as ICollection<T>;
if (collection != null)
{
    return collection.Count;
}
// Otherwise do it the long way (GetEnumerator / MoveNext)

Again, I’m fine with using "as" here.

Conclusion

I have nothing against the "as" operator, when used carefully. What I dislike is the assumption that it’s "safer" than a cast, simply because in error cases it doesn’t throw an exception. That’s more dangerous behaviour, as it allows problems to propagate. In short: whenever you have a reference conversion, consider the possibility that it might fail. What sort of situation might cause that to occur, and how to you want to proceed?

  • If everything about the system design reassures you that it really can’t fail, then use a cast: if it turns out that your understanding of the system is wrong, then throwing an exception is usually preferable to proceeding in a context you didn’t anticipate. Bear in mind that a null reference can be successfully cast to any nullable type, so a cast can never replace a null check.
  • If it’s expected that you really might receive a reference of the "wrong" type, then think how you want to handle it. Use the "as" operator and then test whether the result was null. Bear in mind that a null result doesn’t always mean the original value was a reference of a different type – it could have been a null reference to start with. Consider whether or not you need to differentiate those two situations.
  • If you really can’t be bothered to really think things through (and I hope none of my readers are this lazy), default to using a cast: at least you’ll notice if something’s wrong, and have a useful stack trace.

As a side note, writing this post has made me consider (yet again) the various types of "exception" situations we run into. At some point I may put enough thought into how we could express our intentions with regards to these situations more clearly – until then I’d recommend reading Eric Lippert’s taxonomy of exceptions, which has certainly influenced my thinking.

Array covariance: not just ugly, but slow too

It seems to be quite a long time since I’ve written a genuine "code" blog post. Time to fix that.

This material may well be covered elsewhere – it’s certainly not terrifically original, and I’ve been meaning to post about it for a long time. In particular, I remember mentioning it at CodeMash in 2012. Anyway, the time has now come.

Refresher on array covariance

Just as a bit of background before we delve into the performance aspect, let me remind you what array covariance is, and when it applies. The basic idea is that C# allows a reference conversion from type TDerived[] to type TBase[], so long as:

  • TDerived and TBase are both reference types (potentially interfaces)
  • There’s a reference conversion from TDerived to TBase (so either TDerived is the same as TBase, or a subclass, or an implementing class etc)

Just to remind you about reference conversions, those are conversions from one reference type to another, where the result (on success) is never a reference to a different object. To quote section 6.1.6 of the C# 5 spec:

Reference conversions, implicit or explicit, never change the referential identity of the object being converted. In other words, while a reference conversion may change the type of the reference, it never changes the type or value of the object being referred to.

So as a simple example, there’s a reference conversion from string to object, therefore there’s a reference conversion from string[] to object[]:

string[] strings = new string[10];
object[] objects = strings;

// strings and objects now refer to the same object

There is not a reference conversion between value type arrays, so you can’t use the same code to conver from int[] to object[].

The nasty part is that every store operation into a reference type array now has to be checked at execution time for type safety. So to extend our sample code very slightly:

string[] strings = new string[10]; 
object[] objects = strings;

objects[0] = "string"; // This is fine
objects[0] = new Button(); // This will fail

The last line here will fail with an ArrayTypeMismatchException, to avoid storing a Button reference in a String[] object. When I said that every store operation has to be checked, that’s a slight exaggeration: in theory, if the compile-time type is an array with an element type which is a sealed class, the check can be avoided as it can’t fail.

Avoiding array covariance

I would rather arrays weren’t covariant in the first place, but there’s not a lot that can be done about that now. However, we can work around this, if we really need to. We know that value type arrays are not covariant… so how about we use a value type array instead, even if we want to store reference types?

All we need is a value type which can store the reference type we need – which is dead easy with a wrapper type:

public struct Wrapper<T> where T : class
{
    private readonly T value;
    public T Value { get { return value; } }
    
    public Wrapper(T value)
    {
        this.value = value;
    }
    
    public static implicit operator Wrapper<T>(T value)
    {
        return new Wrapper<T>(value);
    }
}

Now if we have a Wrapper<string>[], we can’t assign that to a Wrapper<object>[] variable – the types are incompatible. If that feels a bit clunky, we can put the array into its own type:

public sealed class InvariantArray<T> where T : class
{
    private readonly Wrapper<T>[] array;
    
    public InvariantArray(int size)
    {
        array = new Wrapper<T>[size];
    }
    
    public T this[int index]
    {
        get { return array[index].Value; }
        set { array[index] = value; }
    }
}

Just to clarify, we now only have value type arrays, but ones where each value is a plain wrapper for a reference. We now can’t accidentally violate type-safety at compile-time, and the CLR doesn’t need to validate write operations.

There’s no memory overhead here – aside from the type information at the start, I’d actually expect the contents of a Wrapper<T>[] to be indistinguishable from a T[] in memory.

Benchmarking

So, how does it perform? I’ve written a small console app to test it. You can download the full code, but the gist of it is that we use a stopwatch to measure how long it takes to either repeatedly write to an array, or repeatedly read from an array (validating that the value read is non-null, just to prove that we’ve really read something). I’m hoping I haven’t fallen foul of any of the various mistakes in benchmarking which are so easy to make.

The test tries four scenarios:

  • object[] (but still storing strings)
  • string[]
  • Wrapper<string>[]
  • InvariantArray<string>

Running against an array size of 100, with 100 million iterations per test, I get the following results on my Thinkpad Twist :

Array type Read time (ms) Write time
object[] 11842 44827
string[] 12000 40865
Wrapper<string>[] 11843 29338
InvariantArray<string> 11825 32973

That’s just one run, but the results are fairly consistent across runs. The one interesting deviation is the write time for object[] – I’ve observed it sometimes being the same as for string[], but not consistently. I don’t understand this, but it does seem that the JIT isn’t performing the optimization for string[] that it could if it spotted that string is sealed.

Both of the workarounds to avoid array covariance make a noticeable difference to the performance of writing to the array, without affecting read performance. Hooray!

Conclusion

I think it would be a very rare application which noticed a significant performance boost here, but I do like the fact that this is one of those situations where a cleaner design also leads to better performance, without many obvious technical downsides.

That said, I doubt that I’ll actually be using this in real code any time soon – the fact that it’s just "different" to normal C# code is a big downside in itself. Hope you found it interesting though :)

New tool to play with: SemanticMerge

A little while ago I was contacted about a new merge tool from the company behind PlasticSCM. (I haven’t used Plastic myself, but I’d heard of it.) My initial reaction was that I wasn’t interested in anything which required me to learn yet another source control system, but SemanticMerge is independent of PlasticSCM.

My interested was piqued when I learned that SemanticMerge is actually built on Roslyn. I don’t generally care much about implementation details, but I’m looking out for uses of Roslyn outside Microsoft, partly to see if I can gain any inspiration for using it myself. Between the name and the implementation detail, it should be fairly obvious that this is a tool which understands changes in C# source code rather better than a plain text diff.

I’ve had SemanticMerge installed on one of my laptops for a month or so now. Unfortunately it’s getting pretty light use – my main coding outside work is on Noda Time, and as I perform the vast majority of the commits, the only time I really need to perform merges is when I’ve forgotten to push a commit from one machine before switching to another. I’ve used it for diffs though, and it seems to be doing the right thing there – showing which members have been added, moved, changed etc.

I don’t believe it can currently support multi-file changes – for example, spotting that a lot of changes are all due to a rename operation – but even if it doesn’t right now, I suspect that may be worked on in the future. And of course, the merge functionality is the main point.

SemanticMerge is now in beta, so pop over to the web site and give it a try.

C# in Depth 3rd edition available for early access, plus a discount code…

Readers who follow me on Twitter or Google+ know this already, but…

The third edition of C# in Depth is now available for early access from its page on the Manning website. I’ve been given a special discount code which expires at midnight EST on February 17th, so be quick if you want to use it – it gives 50% off either version. The code is “csharpsk”.

It’s likely that we’ll have a separate (permanent) discount for readers who already own the second edition, but the details of that haven’t been decided yet.

Just to be clear, the third edition is largely the second edition plus the changes to cover C# 5 – I haven’t done as much rewriting as I did for the second edition, mostly because I was already pretty happy with the second edition :) Obviously the largest (by far) feature in C# 5 is async/await, which is covered in detail in the new chapter 15.

Fun with Object and Collection Initializers

Gosh it feels like a long time since I’ve blogged – particularly since I’ve blogged anything really C#-language-related.

At some point I want to blog about my two CodeMash 2013 sessions (making the C# compiler/team cry, and learning lessons about API design from the Spice Girls) but those will take significant time – so here’s a quick post about object and collection initializers instead. Two interesting little oddities…

Is it an object initializer? Is it a collection initializer? No, it’s a syntax error!

The first part came out of a real life situation – FakeDateTimeZoneSource, if you want to look at the complete context.

Basically, I have a class designed to help test time zone-sensitive code. As ever, I like to create immutable objects, so I have a builder class. That builder class has various properties which we’d like to be able to set, and we’d also like to be able to provide it with the time zones it supports, as simply as possible. For the zones-only use case (where the other properties can just be defaulted) I want to support code like this:

var source = new FakeDateTimeZoneSource.Builder
{
    CreateZone("x"),
    CreateZone("y"),
    CreateZone("a"),
    CreateZone("b")
}.Build();

(CreateZone is just a method to create an arbitrary time zone with the given name.)

To achieve this, I made the Builder implement IEnumerable<DateTimeZone>, and created an Add method. (In this case the IEnumerable<> implementation actually works; in another case I’ve used explicit interface implementation and made the GetEnumerator() method throw NotSupportedException, as it’s really not meant to be called in either case.)

So far, so good. The collection initializer worked perfectly as normal. But what about when we want to set some other properties? Without any time zones, that’s fine:

var source = new FakeDateTimeZoneSource.Builder
{
    VersionId = "foo"
}.Build();

But how could we set VersionId and add some zones? This doesn’t work:

var invalid = new FakeDateTimeZoneSource.Builder
{
    VersionId = "foo",
    CreateZone("x"),
    CreateZone("y")
}.Build();

That’s neither a valid object initializer (the second part doesn’t specify a field or property) nor a valid collection initializer (the first part does set a property).

In the end, I had to expose an IList<DateTimeZone> property:

var valid = new FakeDateTimeZoneSource.Builder
{
    VersionId = "foo",
    Zones = { CreateZone("x"), CreateZone("y") }
}.Build();

An alternative would have been to expose a propert of type Builder which just returned itself – the same code would have been valid, but it would have been distinctly odd, and allowed some really spurious code.

I’m happy with the result in terms of the flexibility for clients – but the class design feels a bit messy, and I wouldn’t have wanted to expose this for the "production" assembly of Noda Time.

Describing all of this to a colleague gave rise to the following rather sillier observation…

Is it an object initializer? Is it a collection initializer? (Parenthetically speaking…)

In a lot of C# code, an assignment expression is just a normal expression. That means there’s potentially room for ambiguity, in exactly the same kind of situation as above – when sometimes we want a collection initializer, and sometimes we want an object initializer. Consider this sample class:

using System;
using System.Collections;

class Weird : IEnumerable
{
    public string Foo { get; set; }
    
    private int count;
    public int Count { get { return count; } }
        
    public void Add(string x)
    {
        count++;
    }
            
    IEnumerator IEnumerable.GetEnumerator()
    {
        throw new NotSupportedException();
    }    
}

As you can see, it doesn’t actually remember anything passed to the Add method, but it does remember how many times we’ve called it.

Now let’s try using Weird in two ways which only differ in terms of parentheses. First up, no parentheses:

string Foo = "x";
Weird weird = new Weird { Foo = "y" };
    
Console.WriteLine(Foo);         // x
Console.WriteLine(weird.Foo);   // y
Console.WriteLine(weird.Count); // 0

Okay, so it’s odd having a local variable called Foo, but we’re basically fine. This is an object initializer, and it’s setting the Foo property within the new Weird instance. Now let’s add a pair of parentheses:

string Foo = "x";
Weird weird = new Weird { (Foo = "y") };
    
Console.WriteLine(Foo);         // y
Console.WriteLine(weird.Foo);   // Nothing (null)
Console.WriteLine(weird.Count); // 1

Just adding those parenthese turn the object initializer into a collection initializer, whose sole item is the result of the assignment operator – which is the value which has now been assigned to Foo.

Needless to say, I don’t recommend using this approach in real code…