An object lesson in blogging and accuracy; was: Efficient "vote counting" with LINQ to Objects – and the value of nothing

Well, this is embarrassing.

Yesterday evening, I excitedly wrote a blog post about an interesting little idea for making a particular type of LINQ query (basically vote counting) efficient. It was an idea that had occurred to me a few months back, but I hadn’t got round to blogging about it.

The basic idea was to take a completely empty struct, and use that as the element type in the results of a grouping query – as the struct was empty, it would take no space, therefore "huge" arrays could be created for no cost beyond the fixed array overhead, etc. I carefully checked that the type used for grouping did in fact implement ICollection<T> so that the Count method would be efficient; I wrote sample code which made sure my queries were valid… but I failed to check that the empty struct really took up no memory.

Fortunately, I have smart readers, a number of whom pointed out my mistake in very kind terms.

Ben Voigt gave the reason for the size being 1 in a comment:

The object identity rules require a unique address for each instance… identity can be shared with super- or sub- class objects (Empty Base Optimization) but the total size of the instance has to be at least 1.

This makes perfect sense – it’s just a shame I didn’t realise it before.

Live and learn, I guess – but apologies for the poorly researched post. I’ll attempt to be more careful next time.

11 thoughts on “An object lesson in blogging and accuracy; was: Efficient "vote counting" with LINQ to Objects – and the value of nothing”

  1. I’d like to see a post about how the idea of Push LINQ relates to the Reactive Framework (IObservable) as soon as more information gets out about that. It seems like they were born out of the same basic idea.

  2. @Jesper: I’m hoping to write about the Reactive Framework in the 2nd edition of C# in Depth, so your wish may well come true.

    From what I’ve read so far, they are very similar in general thrust, although they may well have different niches.

  3. I see you’ve got a comment at the top now, so this may be redundant, but I recall trying to use such a technique to implement a Set using a Dictionary. I can’t remember how I checked it at the time, but I seem to recall that Empty still contributed some non-zero number of bytes. I think you might be able to pull off the trick in C++, but I don’t know how to do so in .NET.

  4. Yep, I have to argee with Weeble.

    Try to execute this:
    struct Empty { }

    class Program
    {
    static void Main()
    {
    var x = new Empty[200000000];
    }
    }

    When new operator is executed – commit size raises by about 200M.

    Btw, if ‘private byte x;’ is added to struct – memory consuming remains the same. Can you explain such a weird behavior?

  5. @Ivan: I suspect it’s a sort of “minimum value size = 1″ in the CLR. There’s a similar effect for reference types: the minimum size is 12 bytes (on x86) despite only 8 bytes being object overhead. A bare `System.Object` takes 12 bytes, as does an instance of a class with an `int` field.

  6. I love Weeble’s appeal to C++, but it won’t work there either. The object identity rules require a unique address for each instance… identity can be shared with super- or sub- class objects (Empty Base Optimization) but the total size of the instance has to be at least 1.

  7. I don’t understand how Ben Voigt’s answer actually explains this; these rules makes sense for reference types – that are compared by address, but not for value types. If value types are only defined by their contents, a value type with no content shouldn’t use any space; why would this one need this byte when the address is irrelevant?

  8. @configurator: I think the point is that two different variables have to have two different storage locations. Imagine if you had *two* empty structs, and two local variables… if they ended up having the same storage location, it would be a little odd. It may be avoidable with cunningness, but it’s probably enough of an edge case not to be a problem.

  9. There’s also another thing I just thought of. In C++ as far as I know you have these nice pointer arithmetics that when you have an pointer of a size-x data type and you do p++ it will increase the pointer by the data type’s size. It wouldn’t work with size-0, though, would it?
    That still doesn’t explain why an empty class is 12 bytes and not 8 though – the object exists anyway, and the size of a System.Object is 8.

Comments are closed.