LA.NET [EN]

C#Archive

Mar 14

Getting started with generics

Posted in Basics, C#, CLR       Comments Off on Getting started with generics

With generics, the CLR offers us another way to ensure code reuse. If you’re a C++ developer, you might be tempted to see generics as some sort of C++ templates. Even though there are certain similarities, the truth is that there are several important differences. For instance, in C++ the source code must be available for the developer instantiating the template. That is not the case with generics in .NET (there are other differences, many captured here). Before going on, I believe it’s a good time to show some code. When generics were introduced, they solved a really bad problem: how to encapsulate an algorithm and make it generic and type safe at the same time. The best way to understand it is to look at a quick example:

class MyList<T> {
    public void Add(T item) {
    }
    public void Remove(T item) {
    }
    public void Sort(IComparer<T> comparer) { }
}

In the previous snippet, MyList is a class which works with any data type (notice the <T> right after the class definition). T is called a type parameter and you can see  it as a name which can  be used anywhere a data type  is supposed to be used. Since the type parameter was introduced by the class itself (notice that  <T> is declared right after the class’ name), it can be used for fields,method’s parameter and return values. You can even used it as local variables from within the class’ methods. After creating a generic type,you can redistribute it and let other developers reuse it with a concrete type. For instance, here’s how you can reuse the previous class for storing lists of integers:

var intList = new MyList<Int32>();
intList.Add(10);
intList.Remove(20);

If you’ve tried to pass a string to intList’s Add method, you’d get a compile error. So, in .NET, generics are type safe because the compiler will always ensure that only objects compatible with the used data type parameter can be used where objects of that type are expected. If you’ve only started using .NET in the last years, then you probably haven’t noticed the performance improvement gained through its introduction. Before generics, generalized algorithms resorted to the Object type. Unfortunately, that meant that using those classes with value types would always result in boxing operations. This was really bad and you’d also need to use lots of casts to access the values  or objects saved by those classes. Thank god we’ve got generics!

Currently, the framework introduces several utility classes built with generics. For instance, the System.Collections.Generic, System.Collections.Object and System.Collections.Concurrent namespaces introduce several generic collection classes which should always be used whenever you need to work with collections and lists of elements. And yes, nothing prevents you from creating your own generic types. Currently, you can create generic reference and value types, but you cannot create generic enumerated types. Notice that you can even create generic interfaces, delegate and methods (useful for those cases where only the method encapsulates a reusable algorithm). Creating a generic is an interesting process, but we’ll leave that analysis for the next post.  Stay tuned.

Feb 28

Generic properties: you can’t do that!

Posted in Basics, C#, CLR       Comments Off on Generic properties: you can’t do that!

One of the things some people expect to be able to do is create generic properties. After all, properties will always generate getter and setter methods and nothing prevents you from creating generic methods. So why can’t I do something like this:

class Test{
    public T MyProp<T>{get;set;}
}

Well, it’s really a conceptual problem, you see…in theory, a property represents a characteristic of an object. Making it generic would mean that that characteristic would be able to change, but this really doesn’t play well with the theory, right? Anyway, what this means is that if you need to add some generic behavior into a class, then you  should do that by adding methods. Btw, don’t confuse the previous code with reusing the generic parameter defined by a class:

//compiles as expected :)
class Test<T>{
    public T MyProp{get;set;}   
}

In these last post, we’ve created a new generic type (we’ll come back to generics in the next posts)…notice that the type of the property isn’t  able to change after we create a new instance of a concrete type:

var test = new Test<Int32>();
test.MyProp = 10; //always int

I guess it’s time to say something about properties…to be honest, I really don’t like them. They look like fields, but they are methods. This means that some of the assumptions you end up doing when accessing fields aren’t true with properties. For instance, a field is always read/write (that might not happen with a property). Besides that, a property cannot be passed as a ref or out parameter and accessing one might also cause side effects (something which never happens with fields).

Over the years, I’ve seen them been abused by developers…hell, I’ve even abused them myself! Currently, I tend to stay away from them. In my opinion, people using them will end up with what is known as an anemic models (models in which there’s almost no behavior…). Currently, I only use properties in messages and for objects which are supposed  to feed UI forms (since the data binding process only works with them).

And that’s it for now. Stay tuned for more.

Feb 20

More on properties: parameterful properties

Posted in Basics, C#       Comments Off on More on properties: parameterful properties

It’s been a long time since I’ve posted in this blog…and there are couple of good reasons for that. For starters, my PC is gone…the disk died and I won’t be replacing it soon. That means that now I’m using a 2 year old netbook for doing work at home and sometimes it gets pretty frustrating (especially when I need to use VS). Besides that, I’ve also been pretty busy working in my next books. Yes, that’s right: books. The HTML book I’ve mentioned before is almost ready and I’m also actively working in a JavaScript book which covers the ECMAScript5 specification (again, in Portuguese, for FCA). As you can see, I’ve been really busy in my free time . Anyways, I believe that things are more stable now and I think I’ll be able to be start writing more frequently in the next months. The idea is to keep writing about some C#/CLR basic concepts, a little but of JavaScript and (probably) pick up some new framework and start digging into it…

Before going on, I’d like to express my condolences to everyone who lost someone on the 20th February floods which hit us a year ago. Yes, it’s been a year, but I believe that this date will never be forgotten by any of us that were here at the time…

In the last technical post,I as talking about properties. And today,we’ll keep looking at them and we’ll talk about parameterful properties. Parameterful properties (a name which I’ve borrowed from Jeff Richter’s excellent CLR via C#) are properties whose get methods accept one or more parameters and whose set methods accept two or more parameters. In C#, these properties are known as indexers and are exposed as array-like properties. The next snippet shows how one can define a parameterful property and use it from C#:

//for demo purposes only…
public class IntArray {
    private int[] _arr;
    public IntArray(Int32 numberOfItems) {
        if( numberOfItems <= 0 ) {
            throw new ArgumentOutOfRangeException();
        }
        _arr = new Int32[numberOfItems];
    }

    public Int32 this[Int32 position] {
        get {
            if( position < 0 || position >= _arr.Length) {
                throw new ArgumentOutOfRangeException();
            }
            return _arr[position];
        }
        set {
            if (position < 0 || position >= _arr.Length) {
                throw new ArgumentOutOfRangeException();
            }
            _arr[position] = value;
        }
    }
}

var ints = new IntArray(10);
ints[0] = 10;
Console.WriteLine(ints[0]);

In the previous example(built only for demon purposes) ,the  paramertul property expects only one parameter in the getter and two in the setter (in the previous case, the first parameter passed to the getter and setter identifies the position and the second parameter ,passed only to the setter and introduced through the “hidden” value parameter, indicates the  value that is supposed to be put into that position).  As you can see, this is used as special name for a parameterful property. In practice, this means that you cannot use static parameterful properties in C# (even though the CLR does support that).The previous snippet shows some interesting recommendations too: for instance, notice that we’re throwing an exception when someone passes and indexer out of the expected interval.

Another interesting feature of parameterful properties is overriding: unlike parameterless parameters, you can override them by using different parameters. Form the CLR point of view, parameterful properties will also give place to setter and getter methods. In the previous example, that means that the code we’ve written will be transformed into something which looks like this:

public class IntArray {
    public Int32 get_Item(Int32 position){/*previous code here*/}
    public void set_Item(Int32 position, Int32 value){ /*previous code here*/}
}

As you can see, the compiler will automatically pick the name Item for a parameterful property and will prepend the getter and setter methods with the get_ and set_ prefix.

In C#, we never use the name Item when consuming a  parameterful property (the [] operator is used instead). However, if you’re writing a library that will be consumed from other CLR languages, then you can customize the name of this property by using the IndexerNameAttribute:

[IndexerName("Integer")]
public Int32 this[Int32 position] {/*previous code here*/}

By doing this, the compiler will automatically generate a pair of methods named get_Integer and set_Integer,
allowing other languages (ex.:VB.NET) to access this property through the Integer name. Notice that the String  type uses this attribute to let you access a char in a string from languages which don’t use the [] operator to interact with parameterful properties. Since you don’t use a name to refer to a indexer in C#, you’ll only be able to introduce one property of this “type” in your C# code (though, as I’ve mentioned previously, you can override it). This behavior might introduce some problems when you’re trying to consume  in C# a type written in another language which defines more than one parameterful property. For that type to be consumed from C#, it must indicate the name of the default parameterful property through the DefaultMemberAttribute (notice that  the C# compiler does this automatically for you C# types and it  does takes into  account the use of the IndexerNamAttribute). And yes, that will be the only parameter that C# code will be able to access…

btw, and before you ask, languages which don’t support parameterful properties can access them through direct getter and  setter method calls. And that’s it for now. Stay tuned form more.

Jan 03

Properties are one type of the members you can define in a class. It might seem strange, but a property will only allow us to call a method in a simplified way. In other words, you can see them as sugar for invoking methods (which typically interact with a  private field). Nonetheless, they’re important and most programming languages offer first class support for them (including C# and VB.NET). The CLR allow us to define two types of properties: parameterless and parameterful properties. In this post, we’ll concentrate in parameterless properties (and we’ll leave parameterfull properties for a future post). So, let’s get started…

As you know, most objects have state and that state is saved through instance fields. For instance, if we think about a Student class which has info about the name and address of someone, then we’d probably end up building a class which looks like this:

public class Student {
    public String Name;
    public String Address;
}

And then, you could consume it like this:

var std = new Student();
std.Name = "Luis";
std.Address = "Funchal";
Console.WriteLine( std.Name + "-" +std.Address );

There’s nothing wrong with the previous code. However, there are a couple of observations that might prevent you from using the code as-is:

  • Many argue that exposing fields is not a good idea because it violates one of the tenets of OO programming (data encapsulation).
  • There are times where you might need to validate the values that are being set to a field. Publicly exposing a field means that anyone can set that field to any value and there’s nothing you can do about it.

The solution to this problem is simple: make you fields private and add a couple of methods which allow you to get or set the values of those fields:

public class Student {
    private String _name;
    private String _address;
    public void SetName(String name) {
        //you could perform validation here
        _name = name;
    }
    public String GetName() {
        return _name;
    }
    public void SetAddress(String address) {
        //you could perform validation here
        _address = address;
    }
    public String GetAddress() {
        return _address;
    }
}

And then, you’d need to change the consuming code so that it uses to methods to interact indirectly with the fields:

var std = new Student();
std.SetName( "Luis" );
std.SetAddress( "Funchal" );
Console.WriteLine( std.GetName() + "-" +std.GetAddress() );

This new approach solves the previous problems,but it will also make you write more code and you’ll need to use the new “access” methods for interacting with the fields. Since Microsoft saw these disadvantages as problems,they’ve ended up introducing the concept of (parameterless) property. Here’s a new version of our class that relies in properties:

public class Student {
    private String _name;
    private String _address;
    public String Name {
        get { return _name; }
        set { _name = value; }
    }
    public String Address {
        get { return _address; }
        set { _address = value; }
    }
}

And here are the changes made to the consuming code:

var std = new Student();
std.Name = "Luis";
std.Address = "Funchal";
Console.WriteLine( std.Name + "-" +std.Address );

(Notice that all these snippets end up printing the same results.)

Defining a property is simple: you can specify a get and a set method, which encapsulate the code for reading and setting the value of that property. As you might expect, get and set are both optional (though you do need to define at least a set or a get when defining a new non-abstract property): it all depends on whether you intend to allow read (get) or write (set) access to a specific property. You’ve surely noticed the use of the value parameter from within the set method. This parameter passes the value attributed to the property and, in the previous example, was simply copied into the private backing field of the property.

Even though it’s not mandatory, most properties end up manipulating one or more private fields of the class where they’re defined. When that happens, the property is said to have a backing field. So, adding a property in C# results in adding a pair of (get and set) accessor methods (depends on whether you want to allow read and write access in your property definition) and on adding a property definition to that class’ metadata. The getter and setters introduced in the property text definition are transformed into methods (prefixed with the get_ or set_ word) which are invoked whenever you read or write a value into that property. Notice that the CLR relies only in these methods for accessing the property and performing the “current” operation. Nonetheless, other tools can access the metadata information for getting more information about the members of a specific class.

When we’re creating simple read/write properties like the ones presented in the previous example, then we can reduce the amount of typing by relying in automatic properties. Here’s the revised version of our class that uses this type of properties for introducing the Name and Address properties:

public class Student {
    public String Name { get; set;}
    public String Address { get; set; }
}

Whenever you create a new property and don’t define the body of the get and set methods, the compiler will automatically introduce a backing field and will implement those acessor methods for you. These methods will simply return the value of the backing field (get) or update its value to a new one (set).

Notice that creating an automatic property is not the same thing as adding a field. With properties, the calling code will always be redirected to the get or set method (instead of accessing the field directly). The advantage of using automatic properties is that you can change the implementation of the property in the future (for instance, you might need to add validation to the values passed to the set method) and you won’t have to recompile the consuming code (if it’s in a different assembly).

There are some disadvantages regarding the use of automatic implemented properties. For starters, you cannot initialize an automatic property during its declaration (this means you need to put that initialization code into a constructor). Its use is discourage if you’re performing any serialization/deserialization of that class because you have no control over the name of the backing field and that is what gets serialized. Finally, you should keep in mind that when creating this type of property in C#, you’ll need to define a get and a set method (after all, what would be the use of having an automatic property with only a setter if you have no way to retrieve the value?)

And that’s it for now. Stay tuned for more on properties…

Nov 23

In the previous two posts, I’ve presented the basics (and some gotchas) associated with the way you declare events. In this post, I’ll present an alternative way for exposing events which is useful when you’re creating a class which has lots of events. In order to understand how this strategy works, we need to make a small detour and see what the compiler does when it finds an event field. Here’s the code we’ve been using for exposing the StudentNameChanged event:

public class Student {
      public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
}

Whenever the compiler finds this definition, it ends up generating the following code (ripped with .NET Reflector from the compiled assembly ):

public class Student {
    private EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged{
        add {
            EventHandler<StudentNameChangedEventArgs> handler;
            EventHandler<StudentNameChangedEventArgs> handler2;
            EventHandler<StudentNameChangedEventArgs> handler3;
            bool flag;
            handler = this.StudentNameChanged;
        Label_0007:
            handler2 = handler;
            handler3 = (EventHandler<StudentNameChangedEventArgs>) Delegate.Combine(handler2, value);
            handler = Interlocked.CompareExchange<EventHandler<StudentNameChangedEventArgs>>(&this.StudentNameChanged, handler3, handler2);
            if (((handler == handler2) == 0) != null) {
                goto Label_0007;
            }
            return;
        }
        remove {
            EventHandler<StudentNameChangedEventArgs> handler;
            EventHandler<StudentNameChangedEventArgs> handler2;
            EventHandler<StudentNameChangedEventArgs> handler3;
            bool flag;
            handler = this.StudentNameChanged;
        Label_0007:
            handler2 = handler;
            handler3 = (EventHandler<StudentNameChangedEventArgs>) Delegate.Remove(handler2, value);
            handler = Interlocked.CompareExchange<EventHandler<StudentNameChangedEventArgs>>(&this.StudentNameChanged, handler3, handler2);
            if (((handler == handler2) == 0) != null) {
                goto Label_0007;
            }
            return;
        }
    }
}

At first sight, the code might look more complex than it really is. As you can see,an event is transformed into a delegate field and the event property ends up generating two methods (add and remove – btw,in the end, don’t forget that you’ll get two methods named add_StudentNameChanged and remove_StudentNameChanged). The add method is used for subscribing an event, while the remove method is called for cancelling a previous subscription. In order to ensure proper working, the code generated for the add and remove methods rely on the CompareExchange method to solve the problems that might arise when our class is used in a multithreaded application (note: the goto label shown can be seen as a do-while loop which keeps adding the passed in delegate until it succeeds in multithreaded environments). Besides that, you’ll surely notice the use of the Combine and Remove static methods used for adding and removing event handlers (I’ll also have a couple of posts about delegates, so I won’t get into this right now).

Now that we know what happens when we define an event, we can see how we can improve our previous event definition by replacing it with an explicit event implementation where the add and remove methods are explicitly defined. Before showing this, it’s important to understand why we need to use a more efficient approach for classes that expose lots of events. The best way to understand why we need this strategy is to think about classes that wrap GUI controls. If you look at the Control class, you’ll notice that it exposes lots and lots of events. If that class exposed its events by using the first approach, we’d end up with lots and lots of fields and that means a lot of memory (for events which might not even be handled by the dev that is using a control).

To solve this memory usage problem, we need to expose events in a more efficient way. To achieve this, we need to add a custom dictionary to our class and then implement our events explicitly through the add and remove methods. Here’s some code that shows how to do this:

public class Student {
    private String _name;
    public String Name {
        get { return _name; }
        set {
            if (value == _name) return;
            var oldName = _name;
            _name = value;
            OnNameChanged( new StudentNameChangedEventArgs(oldName, _name) );
        }
    }
    private static Object _key = new Object(  );
    private Object _locker = new Object(  );
    private EventHandlerList _events = new EventHandlerList(  );
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged {
        add {
            lock(_locker) {
                _events.AddHandler( _key, value );
            }
        }
        remove {
            lock(_locker) {
                _events.RemoveHandler( _key, value );
            }
        }
    }
    protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
        lock(_locker) {
            var handler = (EventHandler<StudentNameChangedEventArgs>)_events[_key];
            if(handler != null ) {
                handler( this, e );
            }
        }
    }
}

As you can see, we’ve added a couple of fields to our class. Besides the EventHandlerList instance, I’ve also added an object used as a key for identifying the StudentNameChanged event in the events custom dictionary (_events field) and another object used for locking purposes (to ensure proper usage of our class in a multithreading environment). Btw, I’ve ended up using the EventHandlerList class since it’s used for all major UI classes introduced by the .NET framework. If you want, you can build your own custom dictionary which takes care of all the goo related with multithreading and invoking the delegates that handle the event (I’ll leave that for you as an exercise).

And I guess this sums it up quite nicely. There might still be a couple of things to say about events, but I think that these last three posts cover most of the important details nicely, so I’ll end up this series here. However, there’s sill a lot of things to talk about .NET and the CLR, so stay tuned for more.

Nov 23

In one of the previous posts, we’ve looked at the basics associated with .NET events. As promised, we’ll start improving our initial code and today we’ll talk about two topics:

  1. lambdas aren’t always your friends.
  2. we live in a multithreaded world.

Lets start with number 1…In the previous code, I’ve uses something like this to handle the event:

var std = new Student( );
std.StudentNameChanged +=
    ( sender, e ) => Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );

Before going on, you should know that I do use this approach in 90% of the scenarios. The problem with it is that you cannot cancel a previous subscription by using this approach. Here’s some code which I’ve seen people use in the past:

var std = new Student( );
std.StudentNameChanged +=
    ( sender, e ) => Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );
std.Name = "Luis";
std.StudentNameChanged -=
    ( sender, e ) => Console.WriteLine( "{0} — {1}",e.OldName,e.NewName );
std.Name = "Luis2";

The idea of the previous code is simple (though utterly wrong): we subscribe an event through a lambda and then we cancel it by passing the same lambda expression. Well, the problem is that the second lambda expression is different from the first. In this case, the easiest approach is to create a method compatible with the event’s type and then use that to subscribe/cancel the event handler:

static  void PrintName(Object sender, StudentNameChangedEventArgs e) {
    Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );
}

And then, we can simply subscribe/cancel the event like we did in the old days:

var std = new Student( );
std.StudentNameChanged += PrintName;
std.Name = "Luis";
std.StudentNameChanged -= PrintName;

With 1 tackled, lets proceed to 2. It’s safe to assume that multithreading is here to stay and that means writing safer code. Let’s recover the code we used to fire the event:

protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
    if( StudentNameChanged != null ) {
        StudentNameChanged( this, e );
    }
}

Everyone who has had to write multithreaded programs will automatically cringe while reading the previous code. The problem is that between the test and the event execution, the thread can be stopped while another thread removes the existing delegate chain from the event field. And that’s why you’ll typically see the previous code re-written like this:

protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
    var aux = StudentNameChanged;
    if( aux != null ) {
        aux( this, e );
    }
}

The idea is simple: since StudentNameChanged is copied into aux, aux will always reference the same delegate chain that existed at the time of the copy. From that point on, we’ll be using aux for testing and firing the event and we can be sure the aux’s value won’t changed between the test and the execution. Since delegates are immutable, then we’re safe, right?

Unfortunately, we’re not…I’ve used similar code to this for a long time until I’ve learnt that the compiler may optimize (though it currently doesn’t do it) the previous code and simply drop the aux reference. When that happens, we end up with the initial code which exhibits the racing behavior I’ve mentioned before. Bottom line, we need safer code. In these scenarios, the best option I’ve seen is presented by the excellent CLR via C#, by Jeffrey Richter, and consists on using the CompareExchange method:

protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
    var aux = Interlocked.CompareExchange( ref StudentNameChanged, null, null );
    if( aux != null ) {
        aux( this, e );
    }
}

In the previous snippet, CompareExchange will only change the StudentNameChanged event to null *when* it’s null (in other words, it will never change its value if that value is not null). The advantage of using this method is that it will always return a reference to the StudentNameChanged event in a thread safe way. With this small performance hit (yep, there’s a small cost associated with using this method), we’re really safe. As I’ve said before, the compiler doesn’t currently perform the optimization which might break our second version of the code, so you might keep using that approach. Anyway, if you’re writing long-lived code, then you probably should play it safe and go with the more robust version.

In the next post, we’ll still keep looking at events and see how we can improve event declaration for classes that expose lots of events.

Nov 16

Getting started with events

Posted in Basics, C#       Comments Off on Getting started with events

I guess we all know about events, right? Even so, I’ve decided to write a couple of posts about it and today I’ll be talking about some basic stuff associated with event definition and usage. So, what is an event? An event allows a type to notify other objects about something special which happened. If you’re a .NET developer, you know that events are everywhere. For instance, if you look at the Windows Forms’ controls, you’ll see events everywhere (ex.: who hasn’t handled the Click event generated by a Button?).

When a type exposes an event, it means that:

  • it allows another type to register a method which receives future notifications.
  • it allows another type to cancel a previous subscription.
  • it is responsible for notifying all the previous registered methods.

The CLR event model is based on delegates (which allows you to call methods in a type safe way – I guess I’ll return do delegates in future posts). Lets start with a simple example which assumes we have a simple Student type which generates an event in response to a change of its Name property (I’ll call it StudentNameChanges event). Do notice that in the real world I’d simply implement the INotifyPropertyChanged interface to signal this change. Since I want to present all the steps associated with creating an event, I’ll go with my own custom event…

When we expose an event, you must start by deciding if you’ll need to pass custom data to the methods that handle the event. In this case, I’ve decided to pass the old and new name values. In practice, this means that I’ll need to create a new type,derived from EventArgs (this is a convention),which exposes two properties: OldName and NewName.

public class StudentNameChangedEventArgs:EventArgs {
    public String OldName { get; private set; }
    public String NewName { get; private set; }

    public StudentNameChangedEventArgs( string oldName, string newName ) {
        OldName = oldName;
        NewName = newName;
    }
}

As I’ve said, using EventArgs as base is only a convention which you should follow. Nothing prevents you from passing a non-EventArgs type to a method that handles an event (though you’re probably going against what’s expected, which is not a good thing). Now, we’re ready to define the event member. The easies way to do this is to add a public field to our class:

public class Student {
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
}

As you can see, an event is always declared through the event keyword, followed by the expected delegate type. In this case, and since our event args class inherits from EventArgs, we can reuse the EventHandler<T> type. After adding the field, it’s also expected to find a virtual protected method which is responsible for firing the event. Here’s the class’ complete code:

public class Student {
    private String _name;
    public String Name {
        get { return _name; }
        set {
            if (value == _name) return;
            var oldName = _name;
            _name = value;
            OnNameChanged( new StudentNameChangedEventArgs(oldName, _name) );
        }
    }
    protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
        if( StudentNameChanged != null ) {
            StudentNameChanged( this, e );
        }
    }
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
}

The OnNameChanged method starts by checking the StudentNameChanged event field. When it’s not null, it will call all interested parties by passing a reference to the Student instance responsible for the event and the custom EventArgs parameter it received. The previous snippet also shows how the event gets generated. As you can see, it will always be generated from the setter of the Name property.

Now, let’s see how we can consume this event from our C# code:

var std = new Student( );
std.StudentNameChanged +=
    ( sender, e ) => Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );
std.Name = "Luis";

Experienced developers will probably detect several things which can be improved in our previous snippets. For instance, using lambda expressions are great, but only  if you don’t need to cancel the subscription. Anyway, I’ll leave this improvements to the next post of the series. Stay tuned for more.

Oct 28

Zebra code available online

Posted in C#       Comments Off on Zebra code available online

A few years ago, I’ve written a couple of helper classes for printing labels for those funny Zebra printers. At the time, I’ve targeted the TLP2844 model, but the code should work with any printer which understands EPL2. There was a problem with my ISP provider  which resulted in a delete of the rar file which had the code. I’ve improved the code over time, and I do intend to publish a revised version (though you’ll have to wait at least a few weeks for that). Anyway, for now I’m just putting the old version online again. You can get it from here. Enjoy :,,)

Oct 27

The dynamic type

Posted in Basics, C#       Comments Off on The dynamic type

C# 4.0 introduced a new type whose main job is to simplify our job when writing code that needs to use reflection. I’m talking about the new dynamic type. As we all know, C# is type-safe programming language. In practice, this means that the compiler must resolve all expressions into types and their respective operations. Whenever the compiler detects an invalid operation (ex.: calling a method not exposed by a class), it must stop the compilation process and generate an error. The good news is that this type safety ensures that most (if not all) programmer’s errors are detected at compile time.

Compare this with what happens with other dynamic languages, like JavaScript. Before going on, a disclaimer: I love JavaScript, so any errors you might end up having while writing code in it can only be attributed to the developer writing it :,,) Anyway, how many times have we written JS code only to find out some misspelling error at runtime?

Now, there’s also advantages associated with dynamic languages. For instance, compare the code you need to write for using COM components from C# with the code you have to write to consume them from, say, JavaScript…yep, C# starts to suck when you need to do that. With the new dynamic type, things get better:) Here’s an example of what I mean:

dynamic word = new Application {Visible = true};
dynamic doc = word.Documents.Add(  );
word.Selection.TypeText( "Hello, dynamic!" );

Now, if you’re an experienced C# dev, you can’t stop noticing the simplicity of the new code. Just for the fun, let’s see the C# 3.0 equivalent code:

Application word = new Application{Visible = true};
//now, the fun begins
Object missingValue = Missing.Value;
Document doc = word.Documents.Add(
    ref missingValue, ref missingValue, ref missingValue, ref missingValue);
word.Selection.TypeText( "Hello, dynamic!" );

And I was lucky because I picked an easy method. If I needed to replace text, things would quickly become  even more boring…It’s safe to say that we all prefer version 1 of the previous example, right? And the good news is that you can use the same strategy when writing reflection code (for an example of it, check this old post).

So, what happens when you mark a variable or expression with the dynamic keyword? Whenever the compiler sees a dynamic expression, it will insert special code for describing that operation which is used at runtime to determine the real operation that needs to be performed.This special code is generated by the runtime binder. In C#, the runtime binder is defined in the Microsft.CSharp assembly and you must reference it whenever you use the dynamic keyword in your code.

At runtime, things get rather complicated because the binder ends up consuming more memory that would have been necessary if you’re using, say, reflection (if you’re using dynamic types only on a small portion of your code, then you probably should consider not using dynamic types since the advantages of dynamic might not pay off).

A dynamic operation is resolved at runtime according to the real type of the object. If that object implements the IDynamicMetaObjectProvider interface, its GetMetaObject method ends up being called. It returns a DynamicMetaObject derived type which is responsible for performing the bindings for members of that type (ie, mapping the members, methods and operators specified in the code you’ve written. Dynamic languages in .NET have their own DynamicMetaObject derived classes (which allows them to be easily consumed from C#). Something similar happens with COM components (the C# runtime binder uses a DynamicMetaObject derived object which knows how to communicate with COM components). When the object doesn’t implement the interface, C# ends up using reflection for executing the required operations.

Now, there are a couple of interesting operations you can do with a dynamic type. For starters, any expression can be implicitly converted into a dynamic type:

Int32 a = 10;
dynamic b = a;

Yep, you’ll end up with boxing in the previous snippet. Even more interesting is the fact that you can implicitly convert from a dynamic into some other type because the CLR will validate that cast at runtime:

Int32 c = a;

Notice that you cannot do this with an Object instance that resulted from boxing an integer. If the dynamic value isn’t compatible with the desired type, you’ll end up with a InvalidCastException. Another interesting thing is that evaluating a dynamic expression gives you a new dynamic expression:

dynamic a = 10;
Int32 b = 2;
var t = a + b;
t.DoesntHaveThisMethodButCompiles( );

You’ll succeed if you try to compile the previous snippet! Of course, you’ll get an exception at runtime since ints don’t have a DoesntHaveThisMethodButCompiles method. Notice that var is the same as dynamic in the previous snippet! (and, btw, don’t confuse var with dynamic. var is just a shortcut that lets the compiler infer the type of a variable).

Whenever you use a dynamic variable in a foreach or using block, the compiler will automatically generate the correct code for that scenario (in the foreach, it will convert the variable into an IEnumerable; in the using case, it will cast it to IDisposable). Pretty slick, right?

And that’s it. Stay tuned for more.

Oct 25

In the previous post, we’ve started looking at the Equals method and saw that its default implementation (inherit from Object) had some flaws. We’ve seen a better implementation for it and we’ve also talked about some strategies for overriding the method in new custom types. In this post, we’re going to talk about a somewhat related concept: hash codes.

You see, all objects inherit a method called GetHashCode from the base Object type. This method is virtual, returns an integer and is defined by the Object type because the designers of the framework though that it would be a good idea to allow any object to be used as a key in a hashtable. The current rules governing hash code generation are quite interesting:

  • if two objects *are* equal, then they should return the *same* hash code.
  • if two objects *aren’t* equal, they don’t have to generate *different* hash codes. Many are surprised by this at first…
  • you should use at least one instance field for calculating the hash code of an object. You should rely on immutable fields for this because these fields are initialized during construction and then remain constant during the object’s lifetime. This is important and the docs should have presented this a a *must* (not a *should*).
  • the returned result must be consistent and it should be the same as long as there is no modification to the object state that determines the return value of the Equals method.
  • your method should strive to have a good random distribution.

As you can see, the rules for hash code generation imply that you’ll have to override GetHashCode whenever you override the Equals method. The Object’s GetHashCode implementation inherited by all types will simply return a number which doesn’t change during the object’s lifetime and is guaranteed to uniquely identify it in the current application domain. As you might expect, ValueType does follow the previous rules while overriding the GetHashCode method. Unfortunately, you’ll have the same performance problem mentioned before because it will have to use reflection to ensure that all the fields of a type are used in the algorithm.

Building your own hash code method isn’t as easy as it might look at first. If you look at the previous rules, you’ll notice that there are several constraints which make it hard to implement. One of the things that isn’t mentioned in the previous list (and it should be!) is that hash codes shouldn’t be able to change. In fact, they *can’t* change because it might break your application. Unfortunately,this isn’t really mentioned in the docs nor followed by the framework code. A quick example is the best way of illustrating this point. Take a look at the following code:

struct Point {
    public Int32 X { get; set; }
    public Int32 Y { get; set; }
}

This code seems simple enough and harmless,right? Well, guess what? It’s not…one of the things you should keep in mind while creating new value types is that they should be immutable! For instance, take a look at the DateTime struct…you’ll quickly notice that it doesn’t have any write properties and none of the existing methods change the value of its internal fields (at best, you’ll get a new instance returned). In other words, DateTime is an immutable type: after creating one instance, you can’t really change its state!

Now, if you look at our Point type, you’ll notice that it reuses the base’s Equals and GetHashCode implementation. Yes, I’ve said we should always override those methods, but they should work fine for the purpose of this demo (though probably a bit slower than if we introduced our own overrides of these methods). So. let’s start simple:

var hashtable = new Dictionary<Point, String>( );
var a = new Point {X = 10, Y = 20};
hashtable.Add(a, "Hi"  );
Console.WriteLine(a.GetHashCode( ));
Console.WriteLine(hashtable[a]);

Nothing too fancy here…we’re creating a new instance of a Point and using it as the key of a Dictionary instance. Till now, everything works out perfectly! Now, suppose we do this:

a.X = 20;
Console.WriteLine(a.GetHashCode( ));

I guess that by now you’re seeing it, right? If you’re not hearing alarm bells all over the place, then you should probably make a pause and read the info on the Dictionary class. Specifically, the part where it says this:

Retrieving a value by using its key is very fast, close to O(1), because the Dictionary<TKey, TValue> class is implemented as a hash table.

oopss…if you’ve run the previous code, you’ll notice that a.GetHashCode no longer returns the same value you got in the previous snippet. In fact, go ahead and try to get the previous entry from the hashtable variable:

Console.WriteLine(a.GetHashCode( ));

And here’s the result in my machine (the 1 you see is the total number of entries in the hashtable variable):

post

It seems like you just can’t get the existing entry from the dictionary through the Point instance variable that was used as key. Not good, right? Well, let’s see how we can improve our code to solve this kind of problem. We’ve got several options, but my favorite is turning Point into an immutable instance:

struct Point {
    private readonly Int32 _x;
    private readonly Int32 _y;

    public Int32 X { get { return _x; } }
    public Int32 Y { get { return _y; } }
    public Point(Int32 x, Int32 y) {
        _x = x;
        _y = y;
        _hashcode = null;
    }
    public override bool Equals(object obj) {
        if (obj == null) return false;
        if (obj.GetType() != GetType()) return false;
        var other = ( Point )obj;//unbox
        return other.X == X && other.Y == Y;
    }
    private Int32? _hashcode;
    public override int GetHashCode() {
        if(!_hashcode.HasValue) {
            _hashcode = X.GetHashCode( ) ^ Y.GetHashCode( );
        }
        return _hashcode.Value;
    }
}

I didn’t really follow all the recommendations I’ve mentioned in the previous post (I’ll leave that to you 🙂 ) but now we’ve solved the previous problems. Since point is immutable, you cannot change an instance after creating it and now the hash code stays constant along the instance lifetime.

Notice that I will only calculate the hash code once and only if someone asks for it. If you’re creating a new instance type, you can follow several of the principles presented in this sample. For instance, you should always strive to define which fields are immutable (if you don’t have one, then you can always add one!) and rely on them for calculating the hash code. Since this has become a rather large post, I wont be boring you with an example that shows how this can be done. Instead, I’ll simply redirect you to take a look at the S#arp project, which has some interesting base classes you can reuse to solve these problems.

And that’s it. Stay tuned for more.

Oct 24

Yep, it’s true: I’m still alive! After a long pause on blogging (due to a future project which I’ll talk about in a future post), I’m back…And with a very interesting topic. Comparison between objects is something which developers tend to do a lot. By default, all objects inherit the Object’s Equals virtual method, which returns true if both objects refer to the exactly the same object. According to  the excellent CLR via C#, here’s how that code might look like (I say might look because equality is implemented as an extern method):

public virtual Boolean Equals(Object obj) {
    if (this == obj) return true;
    return false;
}

In other words, you’re running an identity check. This approach might not be enough for you. In fact, many people say that the base Equals method implementation should look like this:

public virtual Boolean Equals(Object obj) {
    //if obj is null, then return false
    if (obj == null) return false;
    //check for types
    if (this.GetType() != obj.GetType()) return false;
    //check for field values
    return true;
}

There’s a lot going on here. We start by checking against null (obviously, if obj is null, we can simply return false). We then proceed and compare the object’s types. If they don’t match, then false it is. Finally,we need to check the fields values of both objects. Since Object doesn’t have any fields,then we can return true. Now, there are two conclusions you can take from the previous snippet:

  • the first, is that you shouldn’t really use Equals to perform identity checks because Equals is virtual and that means that a derived class might change the way that method works. If you want to perform an identity check, then you should use the static ReferenceEquals method (defined by the  Object type).
  • the second is that having a poor Equal’s implementation means that the rules for overriding it are not as simple as they should be. So, if a type overrides the Equals method, it should only call the base class’ method if the base class isn’t Object.

To make things more complex, we should also take the ValueType class into consideration. Interestingly, it overrides the Equals method and uses a similar algorithm to the last one we showed. Since it has to ensure the equality checks for its fields, it needs to resort to reflection. In practice, this means that you should provide your own implementation of the Equals methods when you create new value types classes.

When you’re creating new type, you should always ensure that it follows four rules:

  • Equals must be reflexive: x.Equals(x) must always return true.
  • Equals must be symmetric: x.Equals(y) should return the same result as y.Equals(x).
  • Equals must be transitive: x.Equals(y) == true && y.Equals(z) == true =>x.Equals(z) == true.
  • Equals must be consistent: calling the method several times with the same values should return the same results.

Now, these are the basic guarantees you need to give. There are a couple of extras things you can do too. For instance, you could make your type implement the IEquatable<T> interface to perform equals comparisons in a type safe manner. You *should* also overload the == and != operators. Their internal implementations should always reference the Equals override you’ve written.

Finally, if your object will be used in comparison operations, then you should go ahead and implement the IComparable<T> interface and overload the <, <=, > and >= operators (once again, the operators’ implementation should reuse the IComparable<T>’s CompareTo method). Hashcodes are also influenced by custom Equals method implementation, but we’ll live that discussion for a future post. Stay tuned for more.

Sep 22

Back to methods: overloading

Posted in Basics, C#       Comments Off on Back to methods: overloading

After a small JavaScript detour, I’m back into .NET land. We’ve already seen several interesting details regarding methods, but we still haven’t really discussed the concept of overloading. So, what’s method overloading? The concept is simple, but before, we should probably find a good enough definition for a method. For now, lets define a method as a name that represents a piece of code which performs some operation on a type or on an instance of a type.

If you’ve been doing C# for some type, then you know that you can have several methods with the same name, provided they have a different set of arguments. Ok, here’s a small sample which shows the concept:

public class Student {
    //static overloads
    public static void SayHi(Student std) {
    }
    public static void SayHi(Student std, String msg){
    }
    //instance overloads
    public void SayHi() {
    }
    public void SayHi(String msg) {
    }
}

The previous snippet shows how you can overload static and instance methods. In C#, you can only overload based on a different set of parameters. That means you cannot add the following method to the Student type without getting a compilation error:

//can't overload based on return type
public String SayHi(String msg) {
}

Now,do keep in mind that the CLR does allow this kind of stuff,ie, it does allow you to overload with a different return type. Anyway, we’re writing C# code, so to achieve that, we really need to write IL (which is not going to happen right now, so we’ll consider this paragraph a small side note:)).

Ok, back to C#…so why doesn’t C# allow overloads based in the return type? That happens because overloading is based on having different method signatures. In C#, a method signature consists of its name, the number of type parameters and the type and kind of each of its parameters (considered from left to right). Besides the return type, in C# the params keyword and the type parameter constraint (when you’re using generics) aren’t part of the method signature. Another interesting gotcha is that you cannot override solely based in the out/ref modifiers (as we’ve seen that in the past). Interestingly, the our/ref keywords are considered part of the signature for hiding/overriding purposes.

Overload resolution is an important concept which is used to decide which method will be called when you” write code like the following:

var std = new Student();
std.SayHi("Hello");

The algorithm for finding a method involves several steps. Instead of describing the complete process here (which would involve some serious copying of the C# spec) and since  this is becoming a large post, I guess I’ll just redirect you to the C# spec for getting the complete story on method overload resolution. So, open your C# spec and read the 7.5.5.1 section carefully to see what I mean…And that’s it for now. Tomorrow, we’ll continue our basic tour around .NET and the CLR. Stay tuned for more.

Sep 20

Operator overloading

Posted in Basics, C#       Comments Off on Operator overloading

In some languages (ex.: C#), you can customize the way an operator works. For instance, if you take a look at the String type, you’ll notice that it has these peculiar looking methods:

public static bool operator !=(string a, string b);
public static bool operator ==(string a, string b);

What you’re seeing here is known as operator overloading. Since the String class introduced these methods, you can write the following code to compare strings (btw, when comparing strings, you probably should be more explicit so that someone  that reads your code in the future knows exactly which type of comparison you’re performing):

String str1 = "hi";
String str2 = "hi";
Console.WriteLine( str2 == str1);

Operator overloading isn’t a CLR feature, but a compiler feature. In other words, the compiler is responsible for mapping the operators used in the source code into a method call which can be understood by the CLR. Now, in theory,you’re probably thinking that you can call the static method shown before directly:

Console.WriteLine(String.operator==(str1,str2));

But no, the truth is that you can’t do that (if you try, you’ll get a compiler error which says something like “invalid term: operator”). Even though the CLR doesn’t understand operators, it does specify how languages should expose operator overloads. If a language decides to support operator overload, then it must follow the CLR defined syntax and it must generate methods which match the expected signature. In the case of the == operator, the compiler is supposed to generate an op_Equality method. In case you’re thinking of trying to call that method directly from C#, don’t: you’ll end up getting a compilation error saying that you cannot access the op_Equality method directly.

Before we proceed, you should probably take a look at the complete method name list from here. If you’ve checked it, then you’ve probably noticed that the table has an extra column, called Name or alternative method. When I said that the C# compiler generated a special method for the operator overload, I didn’t mention one important detail: besides respecting the name defined by the CLR, it will also set a flag in the metadata of the method saying that this is a special method which can be used “as an operator”.

When you’re writing code from  a language which doesn’t support operator overloading, you can still introduce the op_XXX methods. Now, the problem is that you also need to have the special flag applied to that method. If you don’t have it, then you won’t be able to use the operator when consuming the type from, say C#. And that’s one of the reasons why you have the friendly name column in that table. MS recommends that you add those methods when you overload operators so that you can always perform the intended operation (as you might expect, these methods will always redirect to the adequate op_XXX methods). I believe MS could have done better here, but  we have to live with what we have, right? And I guess that’s it for now. Stay tuned for more.

Sep 17

In the previous posts, I’ve presented the basics on boxing and unboxing. Today, we’ll take a deep dive into several scenarios which illustrate how boxing/unboxing can occur without you even noticing it. Lets start with a quick recap. Suppose we’ve got the same structure as used in the previous examples:

public struct Student {
    public String Name { get; set; }
    public Int32 Age { get; set; }
}

As we’ve seen, you’ll get a boxing operation whenever you pass an instance of Student to an element which expects a reference type. In other words, the next instructions result in a boxing operation:

Student std = new Student {Name = "Luis", Age = 34};
Object boxedStd = std; //boxing here!

The previous snippet shows an explicit boxing operation. You might need to do this to improve performance. For instance, suppose you need to perform several calls and that all of them expect reference types. Here’s a naive way of doing that:

//all callx methods expect a reference to an Object instance
Call1( std );
Call2( std );
Call3( std );

If you do this, you’ll end up performing three boxing operations (one for each method invocation) because Call1,Call2 and Call3 are expecting to receive a reference type. In these cases,you should probably perform a manual boxing operation before calling those methods:

//all callx methods expect a reference to an Object instance
Object boxedStd = std; //1 boxing operation
Call1( boxedStd );
Call2( boxedStd );
Call3( boxedStd );

With the previous change, we’ve managed to perform a single boxing operation (instead of three). It’s a small improvement, but one that might make a difference when you have lots of calls. Besides method calls, there are other scenarios where you’ll end up with boxing.

If you look at the ValueType type definition, you’ll notice that it inherits from the Object type. So, what happens when you call one of the methods defined by the base class? The answer is: it depends. If you override one of the inherited virtual methods, then the CLR won’t box the instance and you’ll get a non-virtual method call (the CLR can do this since value types are implicitly sealed). *However*, if your override calls the base method, then the value type instance will get boxed because the base method’s this parameter expects a reference type.

When you’re calling a non-virtual method (or a non-overridden method), you’ll get boxing for the same reasons presented in the previous paragraph. There’s still one last scenario where you need to consider boxing. As you know, a struct might implement one or more interfaces. Casting an unboxed instance of a value type to one of the interfaces it implements requires boxing. Take a look at the next example:

public interface IStudent {
    String Name { get; set; }
}
public struct Student: IStudent {
    public String Name { get; set; }
    public Int32 Age { get; set; }
}

And here’s some code which explains where boxing will occur:

Student std = new Student {Name = "Luis", Age = 34};
String nameFromStruct = std.Name;//no boxing here
IStudent stdRef = std;//boxing
String nameFromInterface = stdRef.Name; //gets value from boxed ref

As an extra bonus, do you think you’ll get boxing in the next call? If so, why is that? And if there is boxing, can you do anything to prevent it from happening?

var toString = std.ToString( );

And I think we’ve covered most boxing related scenarios. Stay tuned for more…

Sep 16

Unboxing: is it really the opposite of boxing?

Posted in Basics, C#       Comments Off on Unboxing: is it really the opposite of boxing?

[Update: Thanks to JPC for finding (yet!) another error in my post]

Yes…well, it is, but it’s cheaper (as we’ll see). As we’ve seen in the previous post, we can convert a value type instance into a reference type instance through an operation known as boxing. Unboxing lets you convert a reference type into a value type (if the reference type was originally obtained through a boxing operation, that is). We’ve already seen an unboxing operation in the previous post (though I really didn’t mention it at the time):

var std2 = (Student)cll[0];

In the previous snippet, we’re converting the reference type instance obtained through boxing into a value type instance (notice the cast operator). The algorithm used for unboxing involve the following steps:

  1. the first thing required is obtaining the reference type. Then, the reference is checked against null and if it is null, you’ll end up with an exception. If it’s not, then an additional check is performed: the type of the boxed instance is checked. If it doesn’t match the type indicated in the unboxing operation, you’ll get a different type of exception: in this case, an InvalidCastException is thrown.
  2. If we reach this step, then the fields values are copied from the managed heap to the newly allocated space in the stack.

Notice that the unboxed value is *always* copied into *newly* allocated space in the stack.  IN other words,in the example of the previous post,we’ve ended up with two different Student instances on the stack. And that’s why you’re wrong if you assumed that the next snippet (also copied from the example in the previous post) would print the value true:

Console.WriteLine(std.Name == std2.Name);

As you can see, unboxing is relatively cheaper than boxing. It goes without saying that boxing and unboxing won’t help to improve the performance of your application, so you better pay attention to your code. Now, that you understand boxing and unboxing, we’re ready for the next phase: understand all the scenarios where boxing happens. Stay tuned for more.

Sep 16

Value types and boxing

Posted in Basics, C#       Comments Off on Value types and boxing

As we’ve seen, value types have better performance than reference types because they’re allocated in the stack, they’re not garbage collected nor do they get the extra weight generally associated with reference types. There are, however, some times where we need a “reference to  a value object” (yes, I wanted to write “reference to a value object”). In the old days, that would happen whenever you needed a collection of value objects (as you recall, in .NET 1.0/.NET 1.1 there were no generics). Here’s a small example:

public struct Student {
    public String Name { get; set; }
    public Int32 Age { get; set; }
}

Now, and this is important, pay attention to the following snippet:

var cll = new ArrayList();
var std = new Student {Name = "Luis", Age = 10};
cll.Add(std);
var std2 = (Student)cll[0];
std2.Name = "Abreu";

If you look at the docs, you’ll notice  that the Add method expects an Object instance. In other words,it requires a reference type and not a value type. If you go ahead and compile the previous snippet,you won’t get any compilation errors. What’s going on here? What you’re seeing is perfectly normal and it’s called boxing. Boxing allows us to convert a value type into a reference type. Boxing involves a rather simple algorithm:

  1. allocate memory on the managed heap for the value object plus the “traditional” overhead space required for all the reference types (btw, I’m talking about the type object pointer and the sync block index).
  2. copy the values of the value type’s fields into the heap’s allocated memory.
  3. use the returned memory address as the “converted” reference type instance.

When the compiler detected that the Add method requires a reference type, it went ahead and applied the previous algorithm in order to transform the value type and pass a reference into the method. In other words, what got added to the cll collection was the reference obtained from step 3 (and not the std variable).

This has lots of implications which might not be obvious at first. For instance, if you think that the following snippet should print true, you’re wrong:

Console.WriteLine(std.Name == std2.Name);

Before getting into why that happens, we need to understand unboxing. Since it’s 22:37 and I still haven’t had my dinner, I’ll leave the unboxing post for later 🙂

Stay tuned for more!

Sep 15

[Update: thanks to Wesner, I’ve fixed the list you should consider when using value types]

In the previous post, I’ve talked about some basic features related with reference types. If all the types were reference types, then our applications would really hurt in the performance department. In fact, hurt is really a kind way of putting it…Imagine having to go through all the hoops associated with creating a new reference type for allocating space for a simple integer…not funny, right?

And that’s why the CLR introduced value types. They’re ideal for simple and frequently used types. By default, value types are allocated in the stack (though they can end in the heap when they’re used as a field of a reference type). Whenever you declare a variable of a value type, that variable will hold the required space for saving a value of that type (instead of holding a reference for a memory position on the heap, like it happens with reference types). In practice, this means that value types  aren’t garbage collected like reference types.

In C#, you create a new value type by creating a new structure (struct keyword) or a new enumeration (enum). Here are two examples:

public struct Student     {
    public String Name { get; set; }
    public Int32 Age { get; set; }
}
public enum Sex {
    Male,
    Female
} 

Whenever you do that,you end up creating a new type which is derived from the abstract ValueType type (notice that ValueType inherits from Object). Trying to specify a base type for a struct results in  a compilation error.  There’s really nothing you can do about that,so get used to it. Notice, though, that you’re free to implement one or more interfaces if you wish to.  Another interesting thing to notice is that all value types are sealed, making it impossible to reuse them as base  for any other type.

Whenever you create a new enum, you’ll end up with a new type which inherits from System.Enum (which is itself derived from ValueType). There’s more to say about enums, but we’ll leave that for another post.

Creating an instance of a struct is as easy as declaring a variable  of that type:

Student std = new Student();
std.Name = "Luis";

In the previous snippet, we’re forced to used new to make the C# compiler happy. With value types, the new operator doesn’t end up allocating space in heap because  the C# compiler knows that Student is a value type and that it should be allocated directly on the stack (btw, it zeroes all the fields of the instance). Notice that you must use it  (or initialize it in some other way) before accessing its fields. If you don’t, you’ll end up with the “use of unassigned field” compilation error.

Now that you’ve met value and reference types, you might be interested in knowing when to use one or the other. In my limited experience, I can tell you that I end up using reference types in most situations, though there are some scenarios where value types should be used:

  • simple, immutable types which behave like primitive types are good candidates for value types. DateTime is probably the best known example of a type.
  • “small” types (ie, types which required less than 16 bytes of allocated space) might be good  candidates. Don’t forget method parameters! If the type isn’t passed into methods a lot, then you can probably relax the size rule.

Before you start using value types all over the place, you should also consider that:

  • value types can be boxed (more about this in a future post).
  • you cannot *should* customize the way these types handle equality and identity.
  • you can’t use any other base than ValueType or Enum nor can you reuse the type as a base for another type.
  • assigning an instance of a value type to another, you’re really doing a field by field copy. this means that assignments will always duplicate the amount of space necessary.

And that’s sums it up pretty nicely. Stay tuned for more.

Sep 14

Still on types: reference types

Posted in Basics, C#       Comments Off on Still on types: reference types

Most types introduced by the framework are reference types. So, what is a reference type? Take a look at the following code:

Student std = new Student();

When you instantiate a reference type you end up with a…yep, that’s right: a reference to the instantiated object. Notice that I said reference, not pointer. People tend to use both as synonyms, and that’s ok for most scenarios. However, if you want to be precise, there are some differences between reference and pointers. For instance, reference will support identity checks but, unlike pointers,  it cannot support all comparison operations (for instance,with pointers you can use the < and > operators,but you cannot do that with references).

Reference types are always allocated from the heap and the new operator will always return a reference for the memory that was allocated for that object. If you look at the previous snippet, that means that std will hold a reference to the memory position where the Student object was instantiated. Even though most of the types introduced by the framework are reference types, the truth is that they come with some gotchas. As you might expect, allocating memory from the heap incurs in a slight performance hit. And it might event force a garbage collection if the heap is getting “full”. Besides that, all reference types have some “additional” members associated with them that must be initialized (more on this in future posts).

In C#, all types declared using the class keyword are reference types. Here’s a possible definition for our Student type used in the previous snippet:

public class Student {
    public String Name { get; set; }
    public Int32 Age { get; set; }
}

There still more to say about reference types, but before that, we need to understand value types. Stay tuned for more.

Sep 10

Checked vs unchecked operations

Posted in Basics, C#       Comments Off on Checked vs unchecked operations

In the previous post, I’ve introduced the concept of primitive type and we’ve seen how they’re treated in a special way by the compiler. One interesting topic that pops up when we talk about primitive types is operation overload. For instance, what happens when you try to run the following snippet:

var max1 = Int32.MaxValue;
var max2 = Int32.MaxValue;
Int32 max3 = max2 + max1;
Console
.WriteLine(max1 + max2);

MaxValue returns the maximum value that can be stored in an Int32 variable. It’s obvious that trying to save that value into an integer won’t produce the correct result. However, you won’t get an exception if you try to run the previous code. The justification is simple: you see, the CLR offers several IL instructions for doing arithmetic operations. Some, allow overflow to occur silently (ex.: add) because they don’t check for it; others don’t and when that happens, you’ll get an exception (ex.:add.ovf).  As you might expect, the instructions which don’t run an overflow check are faster and those are the ones that are used by default by the C# compiler.

If you want to turn overflow verification for all the code in your assembly, then you can simply use the /checked+ compiler switch. If you’re only interested in checking a specific operator or block of instructions, then you can use the checked operator (there’s also an unchecked operator that does the opposite of that). Here’s a small snippet that shows how you can use these operators:

Int32 max3 = checked( max2 + max1 );

And yes, that should throw an exception at runtime. If you’re interested in running several instructions as checked operations, then the checked statement is for you:

checked {
    var max1 = Int32.MaxValue;
    var max2 = Int32.MaxValue;
    Int32 max3 = max2 + max1;
    Console.WriteLine(max3); Console.WriteLine(max3);
}

Notice that only the instructions placed directly inside the block will be checked,having no direct impact in method calls.

Many people recommend turning the /checked+ compiler switch on for debug builds for detecting problems associated with code where you’re not using the checked and unchecked blocks/statements. You should probably used the /checked+ option for release builds unless you can’t take the small performance hit associated with the use of the aritmethic checks.

Before ending the post,a small note on the Decimal type. You can’t run checked/unchecked operations against the Decimal type because the CLR doesn’t introduce any instructions for arithmetic operations against it. In practice, that means that the special treatment given to the Decimal type is supported only by the compiler (ie, adding too decimals will always result in invoking the Decimal’s corresponding operation method). So, if you hit an overflow during an arithmetic operation, you’ll end up getting an exception (even when you’re using the unchecked operator or the /checked- compiler switch).

And that’s it. Stay tuned for more.

Sep 09

An intro on types: primitive types exist!

Posted in Basics, C#       Comments Off on An intro on types: primitive types exist!

If you’re not new to .NET, you’ve probably heard several times that one type can be a value type or a reference type. Well, that is true. What people miss (sometimes) is that they’ve also got some data types which are used so often that compilers allow code to treat them in a simplified way: say hello to the primitive types.

Lets look at a small example. Nothing prevents us from writing this code:

System.Int32 myInt = new System.Int32();
myInt = 10;

But does anyone want to write that when they simply can write this:

System.Int32 myInt = 10;
Oh, yes, you can reduce it even further by using the int alias:
int myInt = 10;

Any data type that is *directly* supported by the compiler can be considered a primitive type. Notice the *directly*…it’s used here to indicate that the compiler knows what to do in order to create a new instance from “simplified” code.

Btw, you’ve surely noticed the int alias, haven’t  you? In fact, it’s not really an alias but a C# keyword that is mapped into the System.Int32 FCL type. Even though you can’t introduce new keywords, you can still introduce new alias to simplify the use of certain types. Here’s an example:

using integer = System.Int32;

And now it’s possible to use integer as an alias to the System.Int32 type (btw, I don’t recommend doing this). Currently, there are several types which are treated as primitive by the C#. In that list, you’ll find plenty of numeric types (Int32,Double,etc), strings (String or string – if you prefer the C# keyword), the object (System.Object) type and even the new dynamic type (which, btw, is mapped into an Object instance). As you can see, primitive types aren’t really limited to a subset of value types.

Besides knowing how to create these types, the compiler is also able to perform other interesting operations over them. For instance, it is able to convert automatically between two types without having any relationship between them. Here’s an example:

Int32 myInt = 10;
Int64 myLong = myInt;

Since Int64 and Int32 aren’t “related”, you can only get away with the previous code because you’re using primitive types (and the compiler has privileged knowledge about them!). The C# compiler will allow this type of implicit conversion only when it knows it’s safe (ie, when it knows that there are no data loss during the conversion – which doesn’t happen ion the previous example because you can always save a 32 bit signed integer into a 64 bit signed integer).

For unsafe conversions (ie, conversions where you loose precision), you need to perform an explicit cast. For instance, if you try to convert a Single into an Int32, you’ll need to perform a cast:

Single val = 10.9f;
Int32 myInt = (Int32) val;

In C#, myInt will end up with the value 10 since the compiler will simply truncate the Single’s value. There’s still one additional feature we have with primitive types: they can be written as literals. A literal is *always* an instance of a type and that’s why you can access its type member directly. Here’s an example:

String val = "Hello There".Substring(0, 5);

Finally, the compiler will also let you use several operators against primitive types and it will be responsible for interpreting them. For instance, you can write the following code:

Int32 someVal = 30;
Int32 res = 10*someVal;

There’s still an interesting discussion left (how to handle overflows), but I’ll leave it for a future post. Stay tuned for more.

Sep 07

Friend assemblies

Posted in Basics, C#       Comments Off on Friend assemblies

In the previous post, we’ve seen the difference between type visibility and member accessibility. A type can be visible to all the other types defined in the same assembly (internal) or it can be visible to any type, independently from the assembly where it’s defined. On the other hand, member accessibility controls the exposure of a member defined by a type.

By default, internal types cannot be instantiated from types defined in other assemblies. That’s why you’ll typically define your helper types as internal so that they can’t be consumed by types defined in other assemblies. There are, however, some scenarios where you’d like to grant access to all the types defined in another assembly and, at the same time, block access to all the other types defined in other assemblies. That’s where friend assemblies can help.

When you create an assembly, you can indicate its “friends” by using the InternalsVisibleToAttribute. This attribute expects a string which identifies the assembly’s name and public key (interestingly, you must pass the full key – not a hash – and you’re not supposed to pass the values of the culture, version and processor architecture which you’d normally use in a string that identifies an assembly). Here’s a quick example which considers the LA.Helpers assembly a friend:

[assembly:InternalsVisibleTo(“LA.Helpers,PublicKey=12312…ef”)]

As I’ve said before,you *do* need to pass the full public key (if the assembly isn’t strongly signed, then you just need to pass its name). In practice, all the types defines in the LA.Helpers assembly can now access all internal types defined on the assembly which contains the previous instruction. Besides getting access to all the internal types, friend assemblies can also access all of internal members of any type maintained in that assembly.

You should probably think carefully about the accessibility of your type’s members when you start grating friend access to other assemblies. Notice also that creating a friend relationship between two assemblies ends up creating a high dependency between  them and that’s why many defend that you should only use this feature with assemblies that ship on the same schedule. Notice that in my experience, I’ve ended up using this feature *only* for testing helpers that I don’t want to expose publicly from an assembly.

I think that most of us don’t use the command line for building any projects, but this post wouldn’t really be complete without mentioning some interesting details about the C# compilation process. When you’re building the friend assembly, you should use the /out parameter and pass the name of the assembly. This should improve your compilation experience since because you’re giving the compiler what it needs to know to check if the types defined in the assembly can access the internal types of the other assemblies. If you’re compiling a module (is anyone doing this in a regular basis?), then don’t forget to use the /moduleassemblyname parameter to specify the name of the assembly that will contain the module (the reason is the same as the one presented for the /out parameter).

And that sums it up quite nicely (I think). Stay tuned for more.

Sep 06

Type visibility vs member accessibility

Posted in Basics, C#       Comments Off on Type visibility vs member accessibility

One of the things I’ve noticed while trying to help others get started with the .NET framework is that they tend to confuse type visibility with member accessibility. In this quick post I’ll try to point out the differences between these two concepts. Let’s start with type visibility.

When you create a type (ex.: class, struct, etc.) it may be visible to all the code (public keyword) or only to the types defined in the same assembly as that type (internal keyword). In C#, you can use the public or the internal qualifier to define the visibility of a type. By default, all the types which haven’t been explicitly qualified with one of these keywords is considered to be internal:

//internal by default
struct T {
//… }

Member accessibility is all about specifying the visibility of the members of a type. In other words, the accessibility indicates which members might be accessed from some piece of code. Currently, the C# allows you do use 5 of the 6 supported CLR member accessibility options:

  • private: members qualified with the private keyword (C#) are only accessible by other members defined in the same type or in a nested type.
  • family: members qualified with the protected keyword (C#) can only be accessed by methods in the defining type, nested type or one of its derived types.
  • family and assembly: you *can’t* use this accessibility in C#. This accessibility says that a member can only be used by methods in the same type, in any nested type or in any derived type defined in the *same* assembly as the current type.
  • assembly: in C#, you use the internal keyword to specify this accessibility level. In this level, the member can only be accessed by all the types defined in the same assembly as the current type.
  • family or assembly: in C#, you need two keywords to specify this level: protected internal. In practice,it means that the member is accessible by any member of the type,any nested type, any derived type (*regardless* of the assembly) or any other method in the same assembly as the current type.
  • public: members qualified with the public keyword (C#) can be used by any other member in any assembly.

Before going on, it’s important to notice that member accessibility depends always in the visibility of the type. For instance, public members exposed by an internal type in assembly A *cannot* be used from assembly B (by default) since the type isn’t visible in that assembly.

In C#, if you don’t specify the accessibility of a member, the compiler will default to private in most cases (one exception: interface methods are always defined as public!). In C#, when you override a member in a derived type, you must use the same accessibility as defined in the base class. Interestingly, this is a C# restriction since the CLR does allow you to change the accessibility of a member in a derived class to a less restrictive level (ex.: you can go from private to protected in the override, but not the other way around).

There’s still a couple of things I have to say about member accessibility, but I’ll leave it for a future post. Stay tuned for more.

Sep 06

Partial classes

Posted in Basics, C#       Comments Off on Partial classes

In the previous post, I’ve talked about partial methods and I’ve promised that the next post would be about partial classes. And here it is! Now that I think about it, I should have written this post before the one on partial methods, but this order will have to do it now, won’t it?

The first thing you need to know about partial classes is that they depend entirely on the C# compiler. The second thing you should know is that you can apply the partial qualifier to classes, structs and interfaces (so I probably should have used Partial types for the title of this post).

In practice, when the C# compiler finds the partial qualifier applied to a type (class, struct or interface), it knows that its definition may span several source “groups”, which may (or may not) be scattered across several files. Partial classes are cool for splitting the “functionalities” of a type into several “code groups” (for example, I’ve used it for splitting the definition of a class into several files to improve readability).

Partial types were introduced to solve the same problem I’ve mentioned in the previous post: customization of code built through code generators. Now that you know the basics, lets go through a simple example. Suppose we’ve got a Student type and that you want to separate it’s features into different code groups. Here’s how you can do that with a partial class:

//File Student.cs
public partial class Student {
    public String Name { get; set; }
    public Int32 Age { get; set; }
}
//File Student.Validation.cs
public partial class Student {
    public void Validate(){
        //...
    }
}

In the previous snippet,we’re creating a new type (Student) which is split into 2 different files (Student.cs and Student.Validation.cs). When you compile the previous code (notice that both files must be in the same project!),you’ll end up with a single type (Student) which has all the members exposed by the partial classes.

Even though I’ve put the partial class definitions into different files, the truth is that you can place both in the same file (though I haven’t seen this being used like that in lots of places). Since partial types don’t depend on the CLR but on the compiler, you need to write all the files in the same language and they must be compiled into the same unit (ie, all the partial file definitions must be defined in the same project).

And there’s not much to say about partial types

Sep 04

Partial methods

Posted in Basics, C#       Comments Off on Partial methods

In previous posts, I’ve mentioned extension methods and how you can use them for extending existing types. After talking with a friend, I’ve noticed that I didn’t mention partial methods, which are really useful for code generators. Imagine you’re writing the code generator that needs to generate C# for some specs which are defined through some sort of wizard and that you need to allow customization of some portions of that code. In the “old days”, the solution was creating virtual methods which didn’t do anything and were called by the generated C# code.

Then, the developer was responsible for creating a new class that expanded the generated C# type and for overriding the required virtual methods. Unfortunately, this technique doesn’t really work all the time. For instance, if the generator is creating a struct, you’re out of luck because structs are implicitly sealed!

Btw, and before going on, you should notice that I’ve discarded the option of changing the generated code file because we all know that, sooner or later,something will happen that will force us to regenerate the code again (leading to the lost of all customizations that have been written).

C# 3.0 (if I’m not mistaken) introduced the concept of partial method for helping with this kind of problem. Partial methods work together with partial classes for allowing the customization of the behavior of a method. Here’s a quick example:

//suppose this is generated by a tool
partial class Address{
    private String _street;
    public String Street {
        get { return _street; }
        set {
            if (value == _street) return;
            _street = value;
            OnStreetChanged();
        }
    }
    //always private!
    partial void OnStreetChanged();
}
//dev code for customizing
partial class Address {
    partial void OnStreetChanged() {
        //do your thing here
    }
}

If you wanted,the generated class could have been sealed. Now, customizing the method is as simple as creating a new partial class and implementing the desired partial method. When compared with the virtual method approach I mentioned earlier, there is an important improvement: if you don’t define a custom implementation to your method, then the method definition and call will simply be removed from the source during compilation. In other words, if there isn’t an implementation of the partial method, then the compiler won’t generate any IL for performing that call.

Before you go crazy and start creating partial methods, you should consider that:

  • they can only be declared in partial classes or structs (the next post will be about partial classes).
  • the return type of a partial method is *always* void and you cannot define any out parameters (understandable since you’re not obliged to implement the partial method).
  • a delegate can only refer to a partial method if you define it explicitly (the reason is the same as the one presented for the previous item).
  • partial methods are always private (you can’t really apply any qualifier to the method and the compiler ensures that they’re always private).

And I guess this sums it up nicely. Stay tuned for more.

Sep 03

If you’re a long time programmer/developer, then you probably expect to be able to create a parameter that receives a variable number of parameters.In C#, to declare a method that accepts a variable number of parameters you need to qualify the parameter with the params keyword. Here’s a quick example:

static void PrintNames( params String[] names){
    foreach (var name in names) {
        Console.WriteLine(name);
    }
}

As you can see, you use an array to represent a variable number of parameters. Notice that you can only apply the params keyword to the last parameter defined by a method and that parameter’s type must be an array of a single dimension (of any type). When you add the params qualifier, you’re essentially simplifying the invocation code:

PrintNames("Luis", "Miguel", "Nunes", "Abreu");

As you can see from the previous snippet, you don’t need to pass an array to the PrintNames method. If you want, you can create an array and call the method, but I’m thinking that most people will prefer the approach used in the previous snippet.

When the compiler finds the params keyword, it applies the ParamArrayAttribute to that parameter. Whenever the C# compiler detects a call to a method,it will first try to match that calls against a method which hasn’t any parameter annotated with the ParamArrayAttribute.  It will only consider methods with a variable number of parameters (ie,that use the params keyword) when it can’t find a match in the previous candidate list.

Before ending, a performance related note: calling a method with a variable number of parameters incurs in a small performance hit since the compiler needs to transform the list of values into an array (in other words, the method will always receive an array, even though you haven’t built one explicitly). btw, this does not happen if you pass null to the method. And as you’ve seen, it takes longer to find that method because the compiler will first try to find a method which doesn’t use a variable number of parameters. That’s why you’ll find several classes which define several overloads with a variable number of parameters before introducing the method which has a single parameter annotated with the params qualifier (ex.: String’s Concat method).

And that’s it for now. Stay tuned for more.

Sep 02

Parameters by reference

Posted in Basics, C#       Comments Off on Parameters by reference

By default, parameters are always pass by value. However, the CLR does allow you to pass parameters by reference. In C#, you can use the out or ref keywords for passing parameters by reference. When the compiler sees that you’ve used these keywords, it will emit code that passes the *address* of the parameter rather than its value.

Interestingly, these two keywords are identical from the CLR’s point of view. The story is completely different for the C# compiler since it uses them to see who is responsible for initializing the value of a variable. Using the *ref* keyword is the same as saying that the caller is responsible for initializing the value of the parameter. On the other hand, using the out leaves that responsibility for the method. Here’s some really simple (dumb) methods:

static void ChangeNameWithOut(out String name){
    name = DateTime.Now.ToString();
}
static void ChangeNameWithRef(ref String name){
    name = DateTime.Now.ToString();
}

Now, here’s how you can use both methods:

String name = "Luis";
String anotherName;
ChangeNameWithOut(out anotherName);
ChangeNameWithRef(ref name);

As you can see, you’re not initializing the anotherName method before passing it to the ChangeNameWithOut method. If you tried to pass anotherName to the ChangeNameWithRef method, you’d end up with a compilation error: Use of unassigned variable ‘anotherName’.

You’ve probably noticed that you’re forced to use the ref and out keywords on the call. For a long time, this puzzled me and I thought that the C# compiler should be able to infer that from the call. According to Jeffrey Richter, the designers of the language felt that the caller should make their intentions explicitly. I’m not sure I agree with that,but it’s just the way it is. And,as we’ll see next, this decision allows to overloads methods based on these keywords.

You can use the out and ref parameters for overloading methods, though you cannot add two overloads that differ only in out or ref only. Here’s some code that illustrate these principles:

static void ChangeName(String name){}
static void ChangeName(ref String name){} //ok
static void ChangeName(out String name){} //error

Adding the first overload won’t lead to a compilation error because you can overload methods by using the ref or out keywords. Adding the last method leads to a compilation error because you cannot add an overload that differs only by out and ref.

Besides overloading, there are some gotchas when you use reference parameters. Here’s a small example that might catch you off guard:

static void DoSomething(ref Object someParameter){}

var str = "Luis";
DoSomething(ref str);

You can’t compile the previous code. If you try, you’ll get an error saying that you cannot convert from ‘ref string’ to ‘ref object’. In other words, the parameter’s type must match the type of the value that is passed. In case you’re wondering, this is needed to ensure that type safety is preserved. The next snippet shows why this is a good thing:

static void DoSomething(ref Object someParameter){
    someParameter = new Student();
}

And I guess this sums it up nicely. There’s still some more about parameters, but we’ll leave it for future posts. Stay tuned for more.

Sep 01

Parameters: by value or by reference? Say what?

Posted in Basics, C#       Comments Off on Parameters: by value or by reference? Say what?

I’m guessing that I won’t be giving any news when I say that parameters are used for passing values into methods. By default, parameters are passed by value. Here’s a quick example which will let us discuss this behavior:

public class Student {
    public String Name { get; set; }
    public Int32 Age { get; set; }
}
static void ChangeName(Student std) {
    std.Name = DateTime.Now.ToString()
}

As I was saying, parameters are passed by value. And that’s true. However, many still are surprised when they discover that ChangeName will update the Name of the Student’s instance that was passed into the method. How can that be? Isn’t passing by value the same as “copying”? Well, that is absolutely correct. What you must keep in mind is the *type* of variable you’re passing into  the method.

In my previous instance, Student is a reference type. What that means is that a variable of that type will reference some memory space, which is where the values of its fields will be stored. Now, even though I don’t like to talk about implementation details, I’d say that they help (at least, in this case). Lets suppose we’ve got some variable std:

var std = new Student { Name = "Luis", Age = 34 };

Now, you can think of std as holding the memory address where the Student object was allocated. When you look at if from this perspective,things might start to make sense,right? Suppose you’ve got the following code:

ChangeName(std);

Since ChangeName’s only parameter is passed by value, you should be able to see what’s going on. The value of the std variable is copied to th the std parameter and that means that the std will hold a memory address. When the ChangeName starts  executing, there are two “variables” pointing at the same “memory location”: the parameter and the variable which was passed into the method through that parameter. That’s why if you try to change the  value of the std parameter (ie, make it reference another instance of Student), that change won’t be replicated outside the method (notice that changing the value isn’t the same as changing the property of object to which the parameter “points” to). Here’s a quick example of what I mean:

static void ChangeName(Student std) {
    std = new Student { Name = "John", Age = 40 };
}

var std = new Student { Name = "Luis", Age = 34 };
ChangeName(std);
Console.WriteLine(std.Name);//prints Luis, not John

See the difference? If the std parameter was passed by reference, then the std variable would point to the new Student instance allocated inside the method. If you run the previous code, you’ll see that that doesn’t happen.

How, then, can we change this behavior? Simply by indicating that the parameters should be passed by reference. We’ll see how in the next post, so stay tuned!

Aug 31

Extension methods: what, how, when

Posted in Basics, C#       Comments Off on Extension methods: what, how, when

I’ve already written a little bit about extension methods in the past. However, since I’ve decided to create a new basics series, I think that I probably should go back and write about it again. I guess that the first thing I need to do is define the what. In other words, *what* is an extension method?

An extension method is a static method that you can invoke using instance method syntax. Here’s a quick example:

namespace StringHelpers {
    public static class StringExtension {
        //don''t need this in real world!!
        public static Boolean ContainsOnlyDigits(this String str){
            if(str == null ) throw new NullReferenceException();
            return str.All(c => char.IsDigit(c));
        }
    }
}
//use the extension method
namespace ConsoleApplication1 { //import extension methods using StringHelpers; class Program { static void Main(string[] args){ var notOnlyDigits = "123Nop"; //const is better Console.WriteLine(((String)null).ContainsOnlyDigits()); } } }

As you can see, an extension method is always static and must be defined in a non generic, non nested, static class. The first parameter of an extension method is annotated with the this qualifier and defines the type over which the method can be called using an instance method syntax. In order to use the extension method (the *how*), we must first introduce it in the current scope. To achieve that, we need to resort to the using directive (as shown in the previous example).

Whenever the compiler finds an extension method, it will convert it into a static method call. Currently, you can use extension methods to extend classes, interfaces,delegates (probably more about this in a future post) and enumerate types. Internally,and since external methods aren’t C# specific, there is a “standard” way to mark a method as an extension method. In C#, when you mark a static method’s first parameter with the this qualifier, the compiler will automatically apply the ExtensionAttribute to that method and to the class where that method is defined. By annotating the type with the attribute, the compiler is able to query an assembly and get all the types that have extension methods defined.

Extension methods are really great and do improve the quality and readability of the code. However, I’ve noticed that they’re starting to be abused…for instance, it was only recently that I’ve downloaded the source code for the Fluent NH project and I couldn’t stop noticing that the CheckXXX methods (used against PersistenceSpecification instances) are all implemented as static methods. I’m having a hard time understanding this and here’s why. In my limited experience, PersistenceSpecification exists so that you can write tests for your NH mappings. Now, that means that you’ll *be* using this methods to test the mappings whenever you instantiate a new instance of that type. Since the guys that wrote the code for those methods *do* have access to the PersistenceSpecification class, I really can’t understand their decision of defining these methods as extension methods (and yes, I know that MS used the same approach for extending the IEnumberable<T> interface, but I believe that we’re not talking about the same type of scenario: after all, it’s perfectly fine to use the IEnumerable<T> without using LINQ).

So, when should we use extension methods? Here’s my thoughts about it (the *when*):

  • use it *sparingly*. I really can’t stress this enough! We’re still doing OO and there are several options for extending types. Do remember that there are some versioning problems associated with extension methods. For instance, if the type you’re “extending” adds a new method with the same name as the extension method, the code will always use the instance method (after recompilation, of course) because extension methods will only be considered as candidates when all the instance method available have been considered as non-viable.
  • they should look and feel like extension methods. The typical example of not following this recommendation is adding an IsNullOrEmpty extension method to some class and returning true when the “this” parameter is null (btw, here’s an example of what I mean). If the “this” parameter of the extension method is null, it should really throw a null reference exception since this is the best option for mimicking what happens with instance methods.

Overall, I’d say that extension methods are an excellent tool, but don’t get carried on and start using it everywhere. After all, we’re still writing object oriented code, right?

Jun 23

Even though I haven’t been as active as I wished, I have still managed to get some time to read several interesting blog posts lately. Unfortunately, it seems like two guys that I have a lot of respect for (though I haven’t had the pleasure of meeting them in person) have gone into the “dark side”. I’m referring to Phil Haack and Oren Eini (aka Ayende). The problem: using an extension method for checking for an empty enumeration.

It all started with Phil’s post, where he presented an extension method for testing for an empty IEnumerable. Here’s the code he ended up with:

public static bool IsNullOrEmpty<T>(this IEnumerable<T> items) {
    return items == null || !items.Any();
}

Phil really explains why using the Any is a good option (especially when compared to Count) and I do agree with that part. Now, the problem here is that Ayende picked it up and improved it so that you don’t end up “loosing” items when you can’t go several times over an IEnumerable. Unfortunately, both of them missed the null check. Don’t understand what I mean? In that case, do take some time and look really carefully at Phil’s initial code. Do you see anything wrong? No? Let me ask you a question: what’s the expected result of the following code:

IEnumerable<Int32> nullCll = null;

var isEmpty = nullCll.IsNullOrEmpty();

In my opinion, it should throw a null reference exception. But that won’t happen in this case. Don’t you find that weird? I do and I think extension methods should behave the same way instance methods. And that really means null exceptions shouldn’t be “replaced” with Boolean values.

Btw, and that’s just my opinion, the helper method should really be a simple helper method (ie, not an extension method). Just because we have extension methods, it doesn’t meant they’re the solution to all the extension problems we have…