LA.NET [EN]

BasicsArchive

May 19

Cultures, cultures and still more cultures…

Posted in .NET, Basics, C#       Comments Off on Cultures, cultures and still more cultures…

In one of my latest’s post on text and strings, reader John Meyer asks a couple of interesting questions:

Can you discuss any differences between these 2 ways of getting CultureInfo objects?

var culture1 = new CultureInfo(“en-US”);
var culture2 = CultureInfo.GetCultureInfo(“en-US”);

left one out of the previous comment, sorry:

var culture3 = CultureInfo.CreateSpecificCulture(“en-US”);

Before answering the question, I believe that it’s important to understand that cultures are grouped into three types: invariant, neutral and specific cultures. An invariant culture is, as its name says, invariant. You can think of this culture as something which… is nor neutral, nor specific :). The invariant culture’s name is the empty string  (“”) and, by default, it’s associated with the English language (but not with a specific country or region). You should only use this culture when you don’t require culture dependent results (ex.: persisting data that is not displayed to the users). So, you probably shouldn’t use this culture for operations that provide feedback to the user. The following code shows how to get a reference to an invariant culture object:

//both return an InvariantCulture object
var neutral1 = new CultureInfo( "" );
var neutral2 = CultureInfo.InvariantCulture;

In the previous examples of the text series, I believe that all the examples I’ve shown relied only on specific cultures CultureInfo objects. A specific culture is always associated with a language and a country or region. For instance, the following snippet creates a culture for the Portuguese language in Portugal:

var culture = new CultureInfo( "pt-PT" );

As you can see, specific cultures are always associated by a pair of chars which identify the language (“pt”) and the country or region (“PT”). Finally, we need to define the neutral culture concept. A neutral culture is a culture that is only associated with a language. In practice, that means that we only pass the language part during initialization of the object. In the following snippet, we’re creating a neutral culture object for the Portuguese language:

var culture = new CultureInfo( "pt" );

Since we’re discussing the existing CultureInfo classifications, there are a couple of interesting points before we start analyzing the previous methods:

  • The predefined cultures form a hierarchy, where a specific culture (ex.: “pt-PT”) is parented by a neutral culture (ex.: “pt”). The invariant culture is the parent of all neutral cultures. After getting a reference to a CultureInfo object, you can navigate through its hierarchy through its Parent property.Oh, and one more thing: the InvariantCulture parent is…drum roll…the InvariantCulture 🙂 (I’m mentioning this here because initially I thought it should be null…).
  • There are some operations which don’t *quite* work with neutral cultures. For instance, formatting a date will only work as expected when you’re using a specific culture. For instance, the next snippet shows what happens when you use the English neutral culture to format a date in the short date format:
var neutral1 = new CultureInfo( "en" );
Console.WriteLine(String.Format(neutral1, "{0:d}", DateTime.Now)); //5/19/20011
Console.WriteLine(String.Format(new CultureInfo( "en-US" ), "{0:d}", DateTime.Now));//5/19/20011
Console.WriteLine(String.Format(new CultureInfo( "en-GB" ), "{0:d}", DateTime.Now));//19/05/20011

Yes, it compiles and runs…but have  you noticed that the returned result isn’t really the one you’d expect if you live in the UK?

Ok, so I guess we’re ready to go…Lets start by understanding the difference between using the constructor and using the static GetCultureInfo method:

var culture1 = CultureInfo.GetCultureInfo( "pt-Pt" );
var culture2 = CultureInfo.GetCultureInfo( "pt-PT" );
Console.WriteLine(Object.ReferenceEquals(culture1, culture2));//true
var culture3 = new CultureInfo( "pt" );
var culture4 = new CultureInfo( "pt" );
Console.WriteLine(Object.ReferenceEquals(culture3, culture4));//false

I think the previous snippet shows what’s happening…whenever you use a constructor, you end up creating a new object. When you use the static GetCultureInfo method, you’ll end up getting a cached version of a CultureInfo object (if there exists one from a previous call). Notice that I’ve used the static Object.ReferenceEquals to make sure that I was getting the same instance from both method calls. It’s easy to understand that if you’ll be needing several instances of the same CultureInfo object, then you should probably use the GetCultureInfo method. Now, we’re left with the CreateSpecificCulture method…Take a look at the following snippet:

var neutral1 = CultureInfo.CreateSpecificCulture( "pt" );
var neutral2 = CultureInfo.GetCultureInfo( "pt" );

And now, a quick look at what the watch window shows during a debugging session:

culture

As you can see, we use the same neutral culture *string*, but we end up with two different culture “types”. neutral1 references a *specific* CultureInfo instance and neutral2 does, indeed, “point” to a *neutral* CultureInfo object (you can easily test this by accessing the IsNeutralCulture property of the CultureInfo objects). As you can see, CreateSpecificCulture provides us with a way to get a reference to a specific culture object from a neutral culture string (btw, if you pass it a *specific* culture string, you’ll get the corresponding specific culture object). Notice that you don’t have any control over the specific culture returned. You only know that it will return a specific culture and there there are some cases where that behavior might end up confusing your users. I’d say that you can use this method, but only sparingly…

And I guess that’s it for now. Stay tuned for more.

May 18

Arrays in .NET – part I

Posted in .NET, Basics, C#       Comments Off on Arrays in .NET – part I

Currently, the CLR supports single-dimension arrays, multi-dimension arrays and jagged arrays (aka, arrays of arrays). Arrays are always implicitly derived from the Array reference type.Creating a single-dimension array is simple, as you can see from the next snippet:

var ints = new Int32[10];

ints references an array which can hold 10  Int32 elements (ie, 10 integers). Since I was initializing the variable in the declaration, then I used the var keyword to reduce the typing. If I only wanted to declare a variable, then I could have simply written the following:

Int32[] ints;
ints = new Int32[10];

In this case, ints is a variable capable of referencing an array of integers. Notice that ints is null until the execution of the second line. It’s also important to keep in minf that ints refers to a memory space which is allocated in the managed heap (and that means that an array are garbage collected) because all arrays implicitly extend the Array base type.

After making sure that ints references an array, you write and read elements from the array by specifying a position (which is know as index). The next example shows how to do that:

//write value 2 to the first position
ints[0] = 2;
//print 1st value of the array
Console.WriteLine(ints[0]);

CLS compliant arrays are always zero based. In other words, the first element of an array is always placed on position 0. Requiring that all arrays to be zero based allows easy sharing of arrays between CLS languages. Since these single-dimension arrays are the most common type of array used in applications, forcing them to be zero-based has allowed MS to optimize its runtime for its use. Notice that I said CLS, not CLR. In fact, the CLR allows you to create non zero based arrays, but you should be prepared for a small performance hit (I’ll return to this topic in a future post).

As I’ve said at the start, you can also have multi-dimension arrays. Here’s a quick example:

//two dimension arrays of 10×10
Int32[,] ints = new Int32[10, 10];
//put 10 in 1st position
ints[0, 0] = 10;

As you can see, it’s simple: you specify the number of elements in each dimension. Notice also the syntax used for the declaration of a multi-dimension array: the comma “specifies” the number of dimensions (if it were a three dimension array, then you’d use two commas like this Int32[,,]). We still need to look at the declaration and use of jagged arrays. In practice, a jagged array is an array of arrays:

Int32[][] ints = new Int32[2][];
ints[0] = new Int32[2];//2 elems array
ints[1] = new Int32[4];//4 elems array
//put something in one of the jagged arrays
ints[0][0] = 1;

As you can see, the [] [] syntax makes it clear that this is an array of arrays. Notice also that, unlike the multi-dimension array, each “dimension” can have a different number of elements (in the previous example, there are 2 elements in the “first dimension” and 4 on the second).

One final note before ending: the CLR ensures that you will only access a valid position of an array. For instance, if you created an array with 20 elements, you’ll end up with an exception when you try to access an element outside of the [0,19] interval. This is a good thing because it ensures that you will always read an element from the array. There is, however, a small cost associated with this strategy. If you think this is really too much, then you can disable it by accessing the array elements through unsafe access (more details on a future post).

And that’s it for now. In the next posts we’ll keep looking at arrays. Stay tuned for more.

May 17

So, you know everything about text, right? – part XV

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right? – part XV

In the previous posts, we took a deep dive into how we can format objects into strings. In this post, we’ll see how to obtain an object from a string (a process known as parsing). By convention, all the types that can parse a string offer a static method named Parse which (at a minimum) expects a string as a parameter. Currently, many of the times introduced by the framework are able to do parsing. For instance, the next snippet shows how to parse an integer from a string:

var intInStr = "100";
var aux = Int32.Parse( intInStr );
Console.WriteLine(aux);

Besides the “simple” Parse method, Int32 offers other overloads of this method:

public static int Parse(string s, NumberStyles style)
public static int Parse(string s, IFormatProvider provider)
public static int Parse(string s, NumberStyles style, IFormatProvider provider)

If you’ve been following the series on text and strings, then you should probably understand what these parameters do. The first parameter (string s) identifies the string which is going to be parsed. NumberStyles is a flags enum which determines the styles permitted in the numeric strings (for instance, AllowExponent indicates that the string can contain a numeric value in exponential form). Finally, IFormatProvider references an object which the parse method can use to obtain culture specific information.The Parse method throws an exception when the passed value doesn’t match the expected numeric string:

//to make it work, you need to pass
//at least the NumberStyles.AllowExponent
var aux = Int32.Parse( "10e2");

If you run the previous code, you’ll end up with an exception because (by default) the simple Parse call uses NumberStyles.Integer for the style parameter. The solution to the problem is simple: pass the NumberStyles.AllowExponent value to the style parameter. Don’t forget that the IFormatProvider plays also an important role in the parsing. For instance, take a look at the following code:

var aux = Double.Parse( "10,2", NumberStyles.Any, new CultureInfo("pt-PT"));

Specifying the pt-PT culture is necessary for getting the double value 10.2. If you had passed the en-US culture, then you’d end up with the value 102 because the char ‘,’ isn’t used as decimal separator in that culture (I believe that in en-US the char ‘,’ is used as a thousands separator).

Since we’re talking about parsing, it’s important to mention that the DateTime type introduces a ParseExact method (besides the traditional Parse method). This method was added to the API of the type because several developers complained about the forgiveness of the original Parse method (after all, parsing dates isn’t really a walk in the park).

But the parsing of DateTime string values wasn’t the only thing developers complained about. Many people have also raised their voice against the implementation followed by the Parse methods. The problem is that there are some apps which need to receive lots of input from the users and if parsing ends up throwing lots of exceptions, then you’ve just degraded the performance of your app. To avoid a breaking change, MS added a new method for the parsing API recommendation: the TryParse method. Here are the overloads introduced by Int32:

public static bool TryParse(string s, out int result)
public static bool TryParse(string s, NumberStyles style, IFormatProvider provider, out int result)

The method returns true when the conversion is performed successfully. In that case, result ends up with the parsed integer.

And I guess that’s it for now. Stay tuned for more.

May 16

In the previous post, we’ve looked at the specificities associated with the usage of the IFormattable interface. As we’ve seen, its ToString method expects a format string and an IFormatProvider instance which allows any interested party to get a reference to an object that can be used for formatting and parsing (more on this in future posts). In this post, we’ll take a look at how we can create custom formatter objects. In this area, we’ve got a couple of options:

  • we can create a custom CultureInfo object. This is a useful approach when none of the cultures defined by the framework can quite be applied to an existing scenario.
  • we can create a custom ICustomFormatter instance. This is a good option when we’re only interested in customizing the way that a specific type is formatted.

Let’s start with the first strategy: creating a custom CultureInfo object. As I’ve said, the framework introduces several predefined cultures which define several important characteristics associated with text, dates and numbers formatting and parsing. Even though I never had to create a custom culture, the truth is that I have some friends which did it because none of the existing cultures covered all their needs. Fortunately for us, the framework introduced the CultureAndRegionInfoBuilder for helping us in the creation of a new culture. Currently, the process involves several steps:

  1. Create a new instance of CultureAndRegionInfoBuilder.
  2. If the new culture is based on an existing one, them use the LoadDataFromCultureInfo and LoadDataFromRegionInfo for initializing the data associated with an existing culture and region.
  3. Modify the properties you wish to customize.
  4. Register the new culture by invoking the Register method.

The registration process is interesting. It starts by creating a .npl file with the information defined by the CultureAndRegionInfoBuilder which is then stored in the %windir%\Globalization folder. After that, it updates the framework’s configuration so that it searches for cultures in the %windir%\Globalization folder instead of relying on the internal cache. As you’ve probably inferred, this process requires administrative privileges (though there is a workaround for non-admins). Here’s a small sample which creates a new culture that is based on the PT culture (it simply replaces the default decimal separator):

var cultureBuilder = new CultureAndRegionInfoBuilder("ptTest", CultureAndRegionModifiers.None);
cultureBuilder.LoadDataFromCultureInfo(new CultureInfo("pt-PT"));
cultureBuilder.LoadDataFromRegionInfo(new RegionInfo("PT"));
cultureBuilder.NumberFormat.CurrencyDecimalSeparator = ".";
cultureBuilder.Register();

After registering the new culture, you can create new custom CultureInfo objects as you normally do, ie, by passing its name to the constructor call.

var money = 10.0;
money.ToString("C", new CultureInfo("ptTest"));//10.0
money.ToString("C", new CultureInfo("pt-PT"));//10,0

As you can see, the ptTest culture uses the ‘.’ as a decimal separator. Before moving on, it’s important to understand that registered cultures can be used in other apps. They even survive eventual reboots of the machine (at least, until you remove the .npl file by calling the static Unregister method).

There are,however, times where you’re only interested in customizing the formatting applied to a specific type. In theses cases, you can implement the ICustomFormatter interface. Typically, the type that implements this interface will also implement the IFormatProvider interface. To illustrate this technique, let’s suppose that we want to return a value in the form [XXX] when XXX is an integer value. Here’s the code I’ve used for the formatter:

class Int32Formatter:IFormatProvider, ICustomFormatter {
    public object GetFormat(Type formatType) {
        //if Int32, return referenc to ICustomFormatter, ie, this
        if( formatType == typeof(ICustomFormatter)) {
            return this;
        }
        //not int: return default formatter
        return Thread.CurrentThread.CurrentCulture.GetFormat(formatType);
    }

    public string Format(string format, object arg, IFormatProvider formatProvider) {
        var val = arg as IFormattable;
        if(val == null ) {
            //does not implement the IFormattable interface
            //call inherited ToString
            return arg.ToString();
        }

        var str = val.ToString(format, formatProvider);
        if(arg.GetType() == typeof(Int32)) {
            return "[" + str + "]";
        }
        return str;
    }
}

And now, we can test our code by using an overload of the String.Format method (or by calling the AppendFormat or …):

var someValue = 10;
var someDouble = 10.0;
var str = String.Format(new Int32Formatter(),
    "Here's an int value: {0} and a double: {1}",
    someValue,
    someDouble);
Console.WriteLine(str);

Now, how does that work? It’s not that complicated…Internally, the String.Format method relies on the StringBuilder’s AppendFormat method to do all the work. This method starts by checking if a formatter was passed and if it offers an ICustomFormatter  object (it does this by calling the IFormatProvider’s GetFormat method). If it does, then AppendFormat ends up calling the ICustomFormatter’s Format method.

Notice that the Format implementation will be called for all placeholder values defined in our string. And that’s why we need to check all the values we receive in the Format method. If the current value doesn’t support the IFormattable interface (required for allowing formatting of the values according to a specific format and culture), then I’ll simply return the result of the instance parameterless ToString method). On the other hand, if the object does implement the IFormattable interface, then I start by calling the IFormattable’s ToString method (and passing it the format string and the format provider). After getting the formatted string, I check for the type of the object. When it’s an integer, I wrap the format string with [ and  ] and return it. If it’s not an integer, I return the previously formatted value. And there you go: a custom provider which is only responsible for customizing the way an integer is formatted in a string.

And I guess this is it for now. On the next post, I’ll take a close look at the inverse process: parsing. Stay tuned for more.

May 16

So, you know everything about text, right? – part XII

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right? – part XII

Even though I’ve said that the previous post would wrap up this series, the truth is that there are still a couple of things I’d like to add in this text. Today, I’ll talk a little bit about the String.Format method which allows us to build strings from many formatted objects. Lets start with a simple example:

var str = String.Format( "{0} was born on the {1}",
                            "Luis",
                            new DateTime( 1976, 06, 27 ) );
Console.WriteLine(str);

The Format method receives a format string (1st parameter) which identifies replaceable parameters through numbered braces. You can use as many placeholders (defined by {position}) as you need. In our example, we’re using two placeholders: the first receives a string and the second is filled with a DateTime value. Printing the string str returns “Luis was born on the 27-06-1976 00:00:00”. If, like me, you’re running the previous code in a thread which is not using the “en-US” culture, you might find the previous result surprising (btw, if you do run it with the “en-US” culture, you end up getting “Luis was born on the 6/27/1976 12:00:00 AM”).

To understand why this happens, we have to take a look at the Format method internals. Internally, the Format method delegates the important work on the StringBuilder’s AppendFormat method, which ends up calling the ToString method for each of the placeholders values. By default, the ToString method will always format the object according to current culture applied to the thread (we’ll return to this topic in a future post). Notice that the ToString method is inherited by all types in the framework because it’s defined by the base Object class. However, its use is rather limited (ex.: it will always format the value according to the culture applied to the current thread). And that’s why the framework introduced the IFormattable interface. This interface, which is implemented by many types exposed by the framework, introduces a single method (also called ToString) which relies on a format provider (an instance of the IFormatProvider passed through the second parameter) for applying a specific format (passed through the first string) to the current object. This is a rather interesting topic and we’ll come back to it in the next post. For now, the important part is understanding that all the values that replace placeholders in a String.Format method call will have its ToString method called (and that ToString method can be the IFormattable’s ToString method, when the type implements that interface, or the “general” ToString method introduced by the base Object type).

Going back to the Format method, you should now that you can change the way formatting is applied to the placeholder values:

//customizing the way dates are presented
var str = String.Format( "{0} was born on the {1:dd/MM/yyyy}",
                            "Luis",
                            new DateTime( 1976, 06, 27 ) );
Console.WriteLine(str);

As you can see, the placeholder used for the DateTime parameter is a little more complex: besides indicating the position for the value which will be applied to that string, it also specifies the formatting that should be applied to that value (in this case, we’re using a custom string which formats the value on the form day/month/year). Whenever we use a placeholder on the form {position:format}, the method will try to call the IFormattable’s ToString method, passing it the format string. Currently, there are several format values which you can pass in the format position of a placeholder. For instance, if you’re interested in formatting numeric types, then this link will be of interest for  you (for getting a general reference on formatting, then check this article).

As you can see, the most important part of the work is done in the ToString method. Since we’ll be coming back to that in the next post, it’s time to wrap this post with a couple of important conclusions:

  • Whenever you need to format several objects into a string, then you should use the String.Format method. This is an efficient option since it relies on the StringBuilder’s AppendFormat method.
  • You can define the formatting of the value of a placeholder by using a placeholder on the form {position:format} (don’t forget that formatting is heavily influenced by the culture and the formatter – we’ll return to this topic in the next post).
  • Besides String.Format, there are other useful methods which rely on StringBuilder’s AppendFormat method. For instance, in console apps, there are overloads of Write and WriteLine methods which end up using a StringBuilder and its AppendFormat method for ensuring good performance in the formatting of several values.
  • Even though we didn’t show it, the truth is that there are overloads of the Format and AppendFormat methods which expect an instance of an IFormatProvider object (which is used for providing formatting information). When you pass an instance of this object, then all calls of the ToString method of the placeholder receive this instance.

And I guess that’s it for now. Stay tuned for more.

May 13

So, you know everything about text, right? – part XI

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right? – part XI

To wrap up this series of posts about text, we’ll talk about best pratices for handling string concatenations. Since string objects are immutable, concatenating lots and lots of strings might be really bad for the performance of your application. Before going on, it’s important to understand that string concatenation *might* only be problematic at runtime. In  practice, this means that we don’t need to worry about literal string concatenations because the compiler is clever enough to build a final string from them. Here’s an example of what I’m saying:

//don't worry about this because the compiler is smart enough
//to build a unique string and embed it in the assemby metadata
var str = "This is a rather large"
            + "string which was broken to improve readability";

Whenever the compiler sees something like that, it will build the string “This is a rather large string which was broken to improve readability” and embed it in the metadata of the assembly. What we really want to solve are scenarios like the following:

//this is bad!
var watch = new Stopwatch( );
watch.Start(  );
var str = "";
for(var i = 0; i < 100000; i++) {
    str += ".!.";
}
watch.Stop(  );//around 15secs
Console.WriteLine(watch.ElapsedMilliseconds/1000.0);

In my machine, it took around 15 secs to concatenate 100000 strings. This is not a good thing. Since strings are immutable, that means that each iteration ends up creating a new string which needs to allocate memory space for all the chars. The previous code is pure evil and in these cases, you should really use the StringBuilder class:

//this is way better!!!!
var watch = new Stopwatch( );
watch.Start(  );
var str = new StringBuilder( );
for(var i = 0; i < 100000; i++) {
    str.Append( ".!.");
}
var final = str.ToString( );
watch.Stop(  );
Console.WriteLine(watch.ElapsedMilliseconds/1000.0);

Now, I’m not going to say how much time this operation took (try it:)), but I will say that it was really really fast…Proficient developers know that we could improve the performance of the previous code by setting the total size of the buffer during the StringBuilder instantiation (notice that in the previous code it’s fairly easy to know the required sized of the buffer).

When developers that are getting started look at the previous examples, they think that they probably should be using StringBuilder instances everywhere…Here’s some code I’ve found before:

var firstName = "Luis";
var lastName = "Abreu";
var bld = new StringBuilder( );
bld.Append( firstName );
bld.Append( "-" );
bld.Append( lastName );

In the previous snippet I’ve introduced variables to simulate receiving input from the user. The first thing you should notice is that we’re just concatenating 3 strings. That’s it! In these cases, you should simply concatenate the strings:

var firstName = "Luis";
var lastName = "Abreu";
var total = firstName + "-" + lastName;

Whenever the compiler sees that you’re using the + operator to concatenate strings, it will automatically transform that into a Concat method call:

var firstName = "Luis";
var lastName = "Abreu";
var total = String.Concat( firstName, "-", lastName );

There are several overloads of the Concat method. The most interesting thing about this method is that it will start by creating a buffer big enough for copying all the strings it receives, so you’ll only have one string instantiation (unlike the initial code shown in the for loop earlier).

What I’m trying to say is that you should *only* use the StringBuilder type when you’re doing lots and lots of concatenations inside a loop (ex.: getting chars from the user). When you’re working with a small number of strings variables, then you should use the String.Concat method (or the + operator, if you think it improves readability).

And that’s it for now. Stay tuned for more.

May 12

So, you know everything about text, right?–part X

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part X

This is becoming a large series on strings…the good news is that there are still several interesting things to say about them, so this will be another “text post”. Today, we’ll take a quick look at how we can secure a string. As you know, there are lots of times where a string contains sensitive data (ex.: passwords). By default, string contents are saved in a char array maintained in memory which can be accessed by some other “naughty” code which might be snooping around the process address space of an application.

If this kind of behavior bothers you (or if you’re developing an app where sensitive data must conforms to more tight security rules), then you should probably rely on the SecureString type. SecureString instances store their string contents as encrypted chars on an unamanged memory block (so that the garbage collector isn’t aware of that memory). Currently, you can insert (instance method InsertAt), append (AppendChar), set (SetAt)or remove (RemoveAt) a char from encrypted array maintained in that unmanaged block. To illustrate its use, lets look at some code:

var secure = new SecureString();
ConsoleKeyInfo keyPressed;
do {
    keyPressed = Console.ReadKey(true);
    secure.AppendChar(keyPressed.KeyChar);
} while (keyPressed.Key != ConsoleKey.Enter);

In the previous snippet, we use the ReadKey to read key presses from a console app. Each key is added to the SecureString instance until the user presses the ENTER key. Notice that the instance methods used for manipulating the secure string stored chars’ need to perform several steps which don’t really “perform” very well. The desired operation is always preceded by a decryption and followed by an encryption to ensure that the internal chars are “safe” again. As you can see, making lots of instance method calls to a SecureString instance might have a negative impact in the performance of your app.

Even though the previous snippet doesn’t show it, the SecureString class implements de IDiposable interface. By doing that, you can easily destroy the unmanaged buffer when you’re done with a SecureString instance (btw, the unmanaged buffer is zeroed before being released). And that’s why you should always dispose of SecureString instances when you’re done with them!

If you look at the API of the SecureString class, you’ll notice that it doesn’t really provide you with an easy way to recover a string instance from it. And that’s by design…and it’s really a good thing Smile If, by any chance, you think that you need to interact with the SecureString string, then don’t!

If you really really really…really need to do that, then the solution is to use the Marshal class. It also means writing unmanaged code, so get ready for seeing some pointers in action (and compiling your project with the /unsafe option):

//PLEASE: don't do this!!!
unsafe static String GetString(SecureString str) {
    StringBuilder bld = new StringBuilder();
    Char* ptr = null;
    try {
        ptr = (Char*) Marshal.SecureStringToCoTaskMemUnicode(str);
        var pos = 0;
        Char current;
                
        while( (current = ptr[pos++] ) != 0) {
            bld.Append(current);
        }
    }
    finally {
        if(ptr != null) {
            Marshal.ZeroFreeCoTaskMemUnicode((IntPtr)ptr);
        }
    }
    return bld.ToString();
}

 

Nasty…to be honest, I really didn’t miss having to use all those pointers…but that’s what is needed to read the chars stored by a SecureString instance. Once again, don’t do this in real world projects!

Unfortunately, SecureString support is limited in the current API of the classes defined by the .NET framework. You can use them with certificates (X509Certificate class), when constructing an event log (EventLogSession class) or when using the PasswordBox control…and that’s pretty much it….I’m hopping to see them used in more places in future releases of the framework, but for now, that’s all we’ve got (I probably missed one or two classes, but I guess that’s all).

And I think this sums it up quite nicely. Stay tuned for more!

May 03

So, you know everything about text, right?–part VIII

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part VIII

As we’ve seen, all chars are represented by 16 bit Unicode values. If you’re a win 32 programmer and you’ve been lucky enough to go “managed”, then I bet nobody is as happy as you because this means that you no longer have to write that lovely code for converting between MBCS and Unicode, right? Unfortunately, there are still times when we do need to encode and decode strings. For instance, if we need to send a file for a specific client, we might need to encode the string. If you don’t know anything about encodings, then this primer by Joel Spolsky is a fantastic read!

By default, and if we don’t specify an encoder, all encodings operations end up using the UTF-8 encoder. With UTF-8, characters can be encoded with 1, 2, 3 or 4 bytes. Since characters below 0x0080 are encoded with a  single char, this type of encoding tends to work well with chars used in the USA. European languages tend also to use chars between 0x0080 and 0x07FF, which require the use of 2 bytes. East Asian languages characters require 3 bytes and surrogates pairs will always be encoded with 4 bytes.

Even though UTF-8 is a popular encoding, it’s not that efficient when you need to encode characters above the 0x07FF char. In those cases, using UTF-16 or UTF-32 might be a better option. With UTF-16, all characters require 2 bytes. In practice, this means that you won’t get any compression at all (like you do when using UTF-8 with chars below 0x0080), but the operation should be fast (after all, this is a “direct copy” of a .NET char because they’re represented with 2 bytes too!). UTF-32 encodes all chars as 4 bytes. Even though it uses more space, it will simplify the algorithm used for traversing the chars because you don’t have to worry with surrogate pairs.

.NET does expose two other predefined encoders: UTF-7 and ASCII. UTF-7 uses 7 bits to encode a char and it should only be used if you have legacy systems which require this format. ASCII encodes a char into an ASCII character (no surprise here!) and you need to be careful because you might end up loosing chars when you use it (chars greater than 0x07F can’t be converter and are discarded during the encoding).

Besides this encoders, you should also know that you can encode any char to a specific code page (if you do, then keep in mind that you might end up loosing chars if they can’t be represented in that code page). In practice, you should always work with UTF-16 or UTF-8. The only excuse to use one of the other encoders is if you have to work with legacy systems. And I guess this covers up the theory. In the next post, we’ll take a look at some code. Stay tuned for more!

May 02

So, you know everything about text, right?–part VII

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part VII

If you’re a reader of this blog, then you probably know that I’m Portuguese. Aside from accentuated chars and the notorious ç, there really aren’t any issues associated with the fact that .NET stores chars in 16 bits memory spaces. In other words, I’m a lucky bastard Smile 

Before going on, a disclaimer: generally, I prefer to blog about areas which I’ve used in my daily activities. Unicode surrogates aren’t really one of those things. However, since I’m writing about text, I believe that the series wouldn’t really be completed without mentioning surrogates. If you do have experience in this area and you do detect a nasty error or an erroneous assumption, then please do use the comment section for correcting meWinking smile Having said that, let’s proceed…

If I had to write Arabic, I wouldn’t be so lucky because those 16 bits aren’t enough for representing all the existing Arabic characters…in these cases, it’s usual to use two 16 bits codes to achieve a single Unicode char. In this scenarios, it’s usual to say that the Unicode char is represented by a high surrogate (the first 16 code value) and a low surrogate (the last 16 code value). If you do need to work with surrogates, then you’ll probably need to resort to the StringInfo class if you need to iterate through the Unicode chars (typically, you’ll refer to each Unicode char as a text element or grapheme). The easiest way to use a StringInfo object is to pass it a string during instantiation. The following snippet illustrates the typical use of this class to enumerate its text elements:

var sb = new StringBuilder();
var s = "a\u0304\u308bc";
var si = new StringInfo(s);
for( var i = 0; i < si.LengthInTextElements; i++ ) {
    sb.AppendFormat("element at {0} is {1}\n",
        i,
        si.SubstringByTextElements(i, 1));
}
MessageBox.Show(sb.ToString());//console won't show it correctly!

It’s important to notice that the StringInfo class is defined in the System.Globalization namespace. After adding a reference to it, you can use the LengthInTextElements property to check the number of text elements. After knowing the current number of text elements, you can use the SubstringByTextElements method to extract the desired portion of text elements.

If you want, you can also get a TextElementEnumerator instance from the GetTextElementEnumerator method: after that, it’s really easy to iterate through the abstract Unicode code chars. Here’s a snippet which illustrates this strategy:

var sb = new StringBuilder();
var s = "a\u0304\u308bc";
var unicodeEnum = StringInfo.GetTextElementEnumerator(s);
while( unicodeEnum.MoveNext()) {
            sb.AppendFormat("element at {0} is {1}\n",
        unicodeEnum.ElementIndex,
        unicodeEnum.GetTextElement());
}
MessageBox.Show(sb.ToString());//console won't show it correctly!

Finally, you can also use the ParseCombiningCharacters method to obtain an Int32 array. The length of the array specifies the number of text elements and each array’s element identifies the index of the string where the first code for each text element can be found. Here’s a small snippet which shows how to use this method:

var sb = new StringBuilder();
var s = "a\u0304\u308bc";
var textElements = StringInfo.ParseCombiningCharacters(s);
for( var i = 0; i < textElements.Length;i++) {
    sb.AppendFormat("char {0} starts at pos {1}\n",
                    i,
                    textElements[i]);
}
Console.WriteLine(sb.ToString());//console won't show it correctly!

Not really as interesting as the first two approaches, but still useful…And I guess this wraps it up for today! Stay tuned for more.

Apr 26

So, you know everything about text, right?–part VI

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part VI

In this post, we’ll talk a little bit about string interning. As I’ve pointed out before, strings are immutable. So, it’s fair to say that the following snippet ends up wasting memory because we should end up with two different instances that, for all practical purposes, represent the same string:

var str1 = "hi";
var str2 = "hi";

From a memory usage point of view, wouldn’t it be great if we could make both variables reference the same String CLR object? The answer to this lies in string interning. When the CLR is initialized, it will automatically create a private hash table, whose entry keys are strings and which holds String objects created and maintained in the managed heap. This technique might lead to some performance improvements when you know *for sure* that your app is supposed to work with lots of “equivalent” strings.

Currently, the String class introduces two static methods related with string interning:

var internedString = String.Intern("hi");
var str = String.IsInterned("hi");//checks if "hi" is interned

Both methods receive a string. The first (Intern) checks the private CLR table for a match. If an identical string already exists, it returns a reference to that string. When that doesn’t happen, it performs a copy of the passed string, adds it to the private table and returns that instance. The IsInterned method might not work as you’re expecting…like the Intern method, it will also take a string which is used to perform a look up in the private CLR’s hash table. If there’s a matching entry, it will return a reference to that string. If that isn’t the case, then it will simply return null.

If you’re an experienced developer, you’re probably seeing a big problem with string interning: there’s no way to free the memory used by the strings maintained in the private hash table…well, to be honest, there is one (but that will probably be too drastic for your app): you need to unload the default AppDomain and that will only happen when you kill your app’s process. see? I told you it would be a bit drastic for most apps Smile

By now, you’re probably wondering if  the CLR performs string interning by default. And the answer is yes, but *only* for all literal strings defined in the assembly metadata. In theory, you should be able to control this behavior. Even though .NET supports the CompilationRelaxationsAttribute since version 2.0, the truth is that the CLR v4.0 will ignore the use of that attribute. You can test this by building the following code with the [CompilationRelaxations(CompilationRelaxations.NoStringInterning)] applied to it (hint: you don’t  really need to add it because the C# compiler automatically adds the attribute for you):

var str1 = "hi";
var str2 = "hi";
var sameString = Object.ReferenceEquals(str1, str2);

To be honest, string interning seems really great in theory, but I’m still not sure about its use in the real world. Even though I’m not the most experienced developer in the world, the truth is that I never had to  use it explicitly in my apps. MS seems to think that it might impact your code in a bad way or it wouldn’t have introduced the CompilationRelaxationsAttribute for allowing you to control it (even though it seems like the CLR won’t respect the use of that attribute in most of the scenarios). I’d say that if you do need to work with lots of strings, then you should consider it…but don’t forget to measure to see if it’s really improving your app. And that’s it for now. Stay tuned for more!

Apr 25

In the previous post I’ve said that we needed a small detour into cultures to conclude the linguistic string comparison section. In .NET, the CultureInfo type holds all the information regarding a specific culture. For example, any instance of CultureInfo will give the name or calendar of a specific culture. In .NET, each culture is identified by a unique name which identifies the language/country pair (if you follow this link, you’ll get all the info about how those pairs are built). In practice, this means that “pt-PT” refers to the Portuguese culture in Portugal (which is quite different from the one used, for ex., in Brazil). Here’s an example of how you might end up creating an instance of this type:

var culture = new CultureInfo("pt-PT");

Every thread has two properties which reference culture objects: CurrentUICulture and CurrentCulture. The first is used to obtain resources which are presented to the user (ex.: this object is used to load the appropriate resource file – that is, if you’re using resource files. if you’re not, shame on you!). By default, any created thread will always reference an object compatible with the language of the installed OS (in MUI, you can change the current culture through the Regional and Language Options in the control panel applet).

The second culture property (CurrentCulture) is used for all the other things which CurrentUICulture isn’t used for (ex.: number formatting, string comparing, etc.). The initial value of this property is influenced by the selected value of the Regional and Language option of the control panel applet. It’s common for both properties to reference the same CultureInfo object. This isn’t obligatory, of course…for instance, nothing prevents us from adapting the user interface to a language while formatting info according to some other culture (a good example of this is a web site which adapts its buttons and labels for different languages – or cultures – and will always format the values according to the “en-US” culture because it needs to bill in dollars). To achieve this, you’d have to set those properties to the adequate CultureInfo objects. But we’re  digressing…

For our string discussion, what matters is understanding that the CurrentCulture refers to a CultureInfo object that influences the comparison operations performed over strings. And to do this, the CultureInfo object uses a CompareInfo object which knows how to sort characters. In my mother language (which, btw, is Portuguese) there really isn’t any interesting gotcha (at least, that I can remember). However, that is not the case with German, where ß has the same value  as ss. In practice, this means that (and please pardon my lack of German knowledge, so that might not be the correct way to write football) the following snippet might get you by surprise:

var str1 = "fussbal";
var str2 = "fu\u00DFbal";
Console.WriteLine("{0} : {1}", str1, str2);//print it
Console.WriteLine( String.Compare(str1, str2, StringComparison.Ordinal) == 0 );//false
Thread.CurrentThread.CurrentCulture = new CultureInfo("de-DE"); //change culture
Console.WriteLine(String.Compare(str1, str2, StringComparison.CurrentCulture ) == 0);//true

 

As you can see, the comparison does return true after I’ve changed the CultureInfo associated with the current thread’s CurrentCulture property. The experienced reader knows that the previous code can be simplified because the Compare method will always perform a character expansion before comparing the strings (in our example, that means that ß will be replaced by ss). So, you don’t really need to change the CurrentCulture or pass the StringComparison.CurrentCulture to the Compare method. Anyways, doing that makes the intent of the code clear and you should always strive to do that.

In one of the previous paragraphs, I’ve mentioned the CompareInfo class: this class is used internally for performing the comparison between the strings. If you need more control, then you’ll be happy to know that nothing prevents you from using that class directly:

var str1 = "fussbal";
var str2 = "fu\u00DFbal";
var culture = new CultureInfo("de-DE");
Console.WriteLine(culture.CompareInfo.Compare(str1, str2) == 0);//true

 

You probably won’t be doing this often, but now you know that it exists Smile. Notice also that there are several overloads of this method which allow you to specify an offset, a length or a CompareOptions value for influencing the returned result. Since we’re talking about CompareInfo, you should also notice that it offers several interesting methods: IndexOf, lastIndexOf, StartsWith, etc.. These methods give you more control than you get by default when using the similar methods of the class String.

And I guess this sums it up: linguistic string comparisons rely on CultureInfo objects which end up delegating that work to the CompareInfo class. There’s still more to say about strings, so stay tuned for more!

Apr 25

So, you know everything about strings, right?–part IV

Posted in .NET, Basics, C#       Comments Off on So, you know everything about strings, right?–part IV

String comparison is one of the most common operations you’ll need to perform in your daily tasks. Besides checking for equality, you’ll also end up comparing strings whenever you need to sort them. The String type offers several methods for performing comparison operations:

  • Equals: this instance method checks  two strings for equality, according with the StringComparison enum value that it receives. There’s also a static version of this method which lets you use the operation without having to check for null.
  • Compare: the type introduces several static overloads of this method which can be used to sort two strings.
  • StartsWith: String introduces two overloads of this instance method. You can use this method to check if the current string starts with another that it receives through a parameter.
  • EndsWith: once again, there are two overloads of this method. You can use them to see if the current string  ends up with another passed in through a parameter.

Most of these methods can receive a StringComparison value through a parameter. This value lets you influence the comparison operation by specifying if the comparison should be case sensitive or insensitive and it the current culture should be used in the current operation.

Some of the overloads of the previous method can also receive a value  from the CompareOptions enum. Since this is a flag enumeration, you can combine several of its values. Most of the comparison operations can be influenced through a CultureInfo object. That’s why there are several overloads which let you specify the culture that should be used on the comparison operation.

In practice, you can divide  comparison operations into two big groups: programmatic string comparisons and linguistic string comparisons. Programmatic string comparisons include all string comparisons related with…you guesses it: programming! For instance, whenever you compare paths, registry keys, URLs, etc., you’re running a programmatic string comparison. The fastest way (and probably the best) of doing this is to run an ordinal case insensitive comparison. Whenever you do this, you’re running a fast comparison because the culture info isn’t taken into account by the method. In practice, this means passing the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase value to the method. You shouldn’t use the InvariantCulture and InvariantCultureIgnoreCase values because they are more costly (from a time/performance point of view) than using the previous values.

There are, however, several times when you do need to compare Strings entered by the user or which are shown to him. In these cases, we’re talking about linguistic string comparisons and they should always take the current culture into account. If you’re sorting strings, you should also perform these operations in a case sensitive way to ensure that two strings which only differ by case aren’t considered equal (if they are, then sorting them might return different orderings in several sort operations and that might really confuse the user). You perform a linguistic string comparison by passing the value StringComparison.CurrentCulture or StringComparison.CurrentCultureIgnoreCase value.

As I’ve mentioned, linguistic string operations depend on the culture. And that’s why we need to take a little detour into cultures before going on. Since this is becoming a rather large post, I’ll leave the details about cultures for a future post. Stay tuned for more!

Apr 24

In the latest post, we’ve seen that strings are immutable. At the time, I’ve mentioned that this brings several advantages, but there are also a couple of gotchas. For instance,  concatenating strings can be an expensive operation, especially when you have lots of strings. To solve this kind of problem, we need to resort to the StringBuilder type. The idea is simple: you create a new instance of the type, add several strings through one of its methods and then, retrieve the final string through its ToString method (which is inherited from Object). Lets start from the beginning…

When you  create a new instance of the StringBuilder, you can use one of the several constructors which let you:

  • specify the maximum number of chars that can be kept by the StringBuilder instance;
  • indicate the default size of the array of chars used by the StringBuilder instance. Notice that that  array might grow when you add strings or chars and its available space  isn’t enough (in that case, the instance will double the current array’s size).
  • pass a string which is used to initialize the internal array of chars held by the type.

You can mix several of those items during construction because the type offers several overloads which let you specify those values (for instance, you can specify the maximum number of chars and the current capacity of the internal array through the StringBuilder( Int32 capacity, Int32 maxCapacity) constructor). The next snippet presents the simplest code you’ll need to instantiate a StringBuilder instance (which, btw, shows its most common use):

var str = new StringBuilder();
Console.WriteLine(str.Capacity);//16
Console.WriteLine(str.MaxCapacity);//Int32.MaxValue

As you can see, the default constructor starts with a 16 chars array and limits the maximum size of that internal array to Int32.MaxValue. After creating an instance, there are a couple of properties/methods which let you change the internal StringBuilder’s array:

var str = new StringBuilder();
Console.WriteLine(str.Length);//number of chars in the array: 4
str.Append("Hello");
Console.WriteLine(str.Length);//5
Console.WriteLine(str[0]); //get char at position 0

You can check the number of chars in the array through the Length property. You can also get or set a single char through the indexed Chars property. The Append method is probably the one you’ll use most often in the day-to-day operations. As you’ve probably inferred from the previous snippet, you can use it to append an object to the internal array (as you’re probably expecting, there are several overloads of this method). Besides Append, you can also use Insert, AppendFormat and AppendLine to add more chars to the internal array.

You’re probably expecting to be able to remove chars from the internal array. If that is the case, you’re correct: you can remove chars by calling the Clear (clears the internal buffer used by the StringBuilder instance) and Remove (removes a range of chars from the array) methods. Finally, there’s also the Replace method which is responsible for replacing all instances of a char with another char or all instances of a string with another string (yes, once again, there are several overloads of this method) in the internal buffer.

One interesting thing regarding these methods is that most of them (if not all) return a reference to itself. In practice, this means that you can chain several method calls:

str.Append("Hello")
    .Replace("o", "0!");

After concatenating everything and performing all the changes you need, you can get a string by calling its toString method:

var finalStr = str.ToString();

By default, most types inherit the Object’s ToString method which simply return the full name of the current object’s type. The StringBuilder type overrides the ToString method because, in this case, it makes more sense to return the encapsulated array char than the name of the type. Before ending, there’s a small gotcha which makes some operations more painful than they should be: there isn’t a complete parity between the methods exposed by String and StringBuilder. For instance, there’s no PadLeft method.

It’s really a pity that the StringBuilder doesn’t expose all the methods defined by String because that means 1.)  doing extra work or 2.) having to go to the String, perform the desired operation and back to StringBuilder instance for continuing with  the string manipulation work. And I guess this is it for now. Stay tuned for more!

Mar 30

So, you know everything about text, right?– part II

Posted in Basics, C#, CLR       Comments Off on So, you know everything about text, right?– part II

In the previous post, we’ve started looking at how to work with text in C# and we’ve run a rather superficial analysis over the Char type. In this post, we’ll start looking at the String type which is probably what you’ll be using most of the time when you need to work with text.

What is a string? In .NET, a string can be seen as a an immutable sequence of characters. Programmatically,  it’s represented through the String type which is sealed and extends the Object type directly (in other words, it’s a reference type). Interestingly, Strings are also considered a primitive type in C# and this means that you can create new Strings through literals:

var aString = "Hello, there";

This is the preferred way to instantiate a new String. The type offers several constructors which let you create a new String from an unmanaged array of chars (char*) or from an unmanaged  array of 8-bit signed integers (aka, SByte). And no, there’s no constructor that receives a string as an argument, though there’s one which creates a new String from an array of Char.

Notice that using the preferred way of creating new strings (ie, through a literal) doesn’t really result in creating a “new“ instance through the newobj IL call. In these cases, the compiler embeds the string in the metadata so that it can load it at runtime.

Strings enjoy special treatment in several languages. For instance, it’s possible to concatenate strings at compile time or at runtime. Here’s an example where the C# compiler is smart enough to concatenate two Strings at compile time:

var aString = "Hello," +" there";

If you’ve had the luck to write some code in C or C++, then you’ll be right at home with the string escape sequences supported in C#:

var aString = "Hello,\tthere";

In the previous snippet, we’ve resorted to \t to introduce a tab in a string. In case you’re wondering, you can escape the \ char used in escape sequences by doubling it:

var path = "C:\\folder";

If you have lots of \ to escape, then you should be using verbatim strings:

var path = @"C:\folder";

Both snippets produce exactly the same results: you end up with a c:\folder string.

Before ending this initial post about strings, there’s one small detail I’ve mentioned at the beginning and which is *really* important. It’s probably the most important thing you should know about strings and I wouldn’t really feel well without writingabout it: strings are *immutable*. Once you create a string, there’s no way to modify it. No, you can’t change a char from it without building a new String instance. No, you can’t make it shorter or longer either!

This might be a surprise, but it does bring a couple of advantages too. For instance, since they’re immutable, you don’t have to worry about synchronization in multithreaded code (IMO, this is a big big thing!). So, you need to do a lot char manipulations? Probably need to concatenate lots of strings at runtime? If that is your case, then you should be using StringBuilder (we’ll be back to this in a future post).

And this is it for now. Stay tuned for more!

Mar 28

So, you know everything about text, right?– part I

Posted in Basics, C#, CLR       Comments Off on So, you know everything about text, right?– part I

In .NET, characters are always represented by 16 bits Unicode values. Programmatically, they’re represented through instances of the System.Char type. Here’s an example of a char represented in C#:

var myChar = 'a';

The Char type offers several helper static methods which do several useful operations. For instance, you can call the IsLetter method to check if a char is a letter:

var isLetter = Char.IsLetter('a');

Besides the IsLetter method, you can also use the IsDigit, IsWhiteSpace, IsUpper, IsLower, IsPuctuation, IsLetterOrDigit, IsControl, IsNumber, IsSeparator, IsSurrogate, IsLowSurrogate, IsHighSurrogate or IsSymbol methods. All these methods have overloads which receive a string and a position that points to the char that needs to be checked. You can get the same results by calling the GetUnicodeCategory method, which returns a value of the UnicodeCategory enumeration. According to the docs, you shouldn’t use this method but the CharUnicodeInfo.GetUnicodeCategory. A small detour: surrogates are an interesting (though out of the scope of this post). Basically, they’re needed to allow the representation of the supplementary characters outside the BMP (Basic Multilingual Plane) with UTF-16. If you’re interested in learning more, I’d recommend getting started here.

Besides checking the “kind” of a character, you can also convert it its lower or upper equivalent in a culture-agnostic way throw the ToLowerInvariant and ToUpperInvariant methods (if you want to take the culture into consideration, then you should call the ToLower and ToUpper methods).

The char type does offer some instance methods too (btw, the previous ones are all static), You can compare two chars through the Equals method or through the CompareTo method (because Char implements the IComparable<char> interface). Besides comparing, you can also find a couple of methods which let you transform a char (or chars) into strings or get chars from an existing string. There’s still another couple of methods which are capable of building chars out of one or more integers (which, as you might have guessed, represent code values). Finally, the GetNumericValue returns the numeric equivalent of the current char.

Before ending this post, there’s still time to refer that the CLR allows you to convert from char into a numeric type (and vice-versa). Explicit casts are the easiest way to do that. You can also perform this operation through one of the methods of the Convert type or by using the IConvertible interface (that is implemented by Char). And I guess this sums it up quite nicely. Stay tuned for more about text.

Mar 26

Last considerations about generics

Posted in Basics, C#, CLR       Comments Off on Last considerations about generics

As we’ve seen, constraints will help in writing “type safe generic code” (not sure if this designation really exists. if it doesn’t, then I’ve invented Smile). But even with generics, there are still a couple of scenarios which might caught you by surprise. Lets start with casts and with a simple example:

class Generic<T> {
    public void DoSomethin(T something) {
        var aux = (String) something;
    }
}

The previous class won’t compile because there’s no way for the compiler to know that it’s possible to convert something into a String. At least, not when it’s compiling the generic open class! Notice that you can’t even apply a constraint to solve this problem. There is, however, a workaround (which isn’t really type safe):

class Generic<T> {
    public void DoSomethin(T something) {
        var aux = (String)(Object) something;
    }
}

Since any object can be cast to Object, then the compiler won’t complain about that first cast. The other cast (String) is now possible, though it might throw an exception at runtime. In fact, there’s a better option of doing the previous cast:

class Generic<T> {
    public void DoSomethin(T something) {
        var aux = something as String;
    }
}

Notice, however, that you’ll only be able to do this whenever you’re converting something into a reference type.

Since generic type arguments can be replaced by any concrete type, then you might be wondering how can you initialize a “generic” variable to its default value. The problem here is specifying its default value. I mean, if you’re talking about reference types, then it’s ok to set it to null. However, doing it with a value type results in a compile type error. To solve this problem, Microsoft introduced the default keyword:

class Generic<T> {
    public void DoSomethin(T something) {
        T aux = default(T);
    }
}

Now, the compiler is happy. When T is replaced by a reference type, aux is initialized with null. On the other hand, if T is replaced by a value type, then aux reference some memory space (big enough to save a value of that type) with all its bits initialized to 0. Btw, there’s also an interesting gotcha regarding the use of null: you can’t use it to initialize a variable, but you can use it with the operators == and !=. Here’s and example:

class Generic<T> {
    public void DoSomethin(T something) {
        if(something == null ) {
            //do something
        }
        else {
            //do something else
        }
    }
}

When T is replaced by a value type, the JIT compiler won’t emit the native code for the if part because a value type never is null. Notice that if we had constrained T to a struct, then yes, we’d be getting a compile error on the if clause.

It’s also important to understand that comparing two variables of the same generic type results in an error if that generic type isn’t constrained to a reference type:

class Generic<T> {
    public void DoSomethin(T something, T somethingElse) {
        if( something == somethingElse) {
            //some code
        }
    }
}

If T were constrained to a reference type, the code would have compiled without any errors. However, constraining T to a a value type, will always result in an error. Since you cannot constrain T to a specific value type (a value type is always sealed and we’ve seen that we can’t do that because it’s more efficient to write code that uses directly that type), you’re limited to saying that it’s a value type. And this information isn’t enough to make the compiler do its work and emit the correct code for the comparison.

A final note: you can’t use generic type variables as operands (ie, you can’t use them with the operators +, –, *, etc.), making it impossible to write expressions which work for any numeric type. This is a tremendous problem, especially for those guys that work in the financial world…

And that’s it for now! Stay tuned for more.

Mar 23

Generic and constructor constraints

Posted in Basics, C#, CLR       Comments Off on Generic and constructor constraints

In this post, we’ll talk about one last kind of constraints: the constructor constraint. Whenever you apply a type constraint to a generic type argument, the compiler ensures that it can only be replaced by a concrete type which does expose a public default constructor. Constructor constrains are specified through the new keyword, as you can see from the following snippet:

public class AnotherGenericConstraint<T> where T:new() {
    public T GetNew() {
        return new T();
    }
}

Without constructor constraints, there really wouldn’t be a way for the compiler to allow you to instantiate generic type to which a primary constraint struct *hasn’t* been applied. Btw, and since we talk about value types, it’s an error to specify both a struct and type constraint to a generic type argument:

public class AnotherGenericConstraint<T>
    where T:struct, new() { //compile error
    public T GetNew() {
        return new T();
    }
}

A final note: currently, there’s not way to specify the number of parameters a constructor may receive. And it seems like that won’t change in the future, meaning that we’re struck with parameterless constructors. And I guess this is all about generic constraints. Stay tuned for more.

Mar 23

Generics and secondary constraints

Posted in Basics, C#, CLR       Comments Off on Generics and secondary constraints

Besides primary constraints, you can also constrain a generic type parameter through secondary constraints. Unlike primary constraints, you can apply zero or more secondary constraints to a generic type argument. Interfaces constraints are the most common kind of secondary constraints you’ll apply in your code. They allow the compiler to ensure that a generic type argument can only be replaced by a type which implements the interface specified by the constraint. If you want, you can specify several interface constraints. In these cases, you can only replace the generic type argument by a concrete type which implements all those interfaces. The next snippet illustrates the use of this kind of constraint:

public interface IDoSomething{}
public interface IDoSomethingElse{}
public class DoEverything:IDoSomething,IDoSomethingElse {
}
public class DoOneThing:IDoSomething {
}
public class MyGeneric<T> where T:IDoSomething, IDoSomethingElse {
        
}

And here’s some code that instantiates the class:

var aux1 = new MyGeneric<DoEverything>();//ok!
var aux2 = new MyGeneric<DoOneThing>();//oops

Besides interfaces, there’s another kind of secondary constraint, which is known as naked type constraint. This secondary constraint allows you to specify the existence of a relationship between two generic type arguments. Here’s a quick example:

public class AnotherGeneric<T, TBase> where T:TBase {
        
}

And here’s some code that shows how you can (and can’t instantiate it):

public class A{}
public class B:A{}
var ok = new AnotherGeneric<B, A>();
var oopss = new AnotherGeneric<A, B>();

ok compiles without an error, but that doesn’t happen with oopss because the compiler knows it can’t convert from A to B (though it knows that it can convert from B to A) .

And I guess that’s it for now. Stay tuned for more.

Mar 21

Generics and primary constraints

Posted in Basics, C#, CLR       Comments Off on Generics and primary constraints

In the previous post, we’ve started looking at constraints. Primary constraints are one type of constraints. Currently, you can specify zero or one primary constraint when you introduce a generic type argument. Whenever you restrain a generic type argument to a non-sealed class, you’re using a type constraint. Here’s an example:

public class Test {
}
public class TestingConstraints<T> where T:Test {
}

When the compiler finds a primary constraint of this type, it knows that T can only be replaced by Test or from another type derived from Test. In the previous example, this means that all public members introduced by Test can be used from within the members of TestingConstraints. Btw, you can’t use a sealed type as a primary constraint. And it’s kind of logical, if you think about it. Lets suppose S is sealed. If S is sealed, then you can’t extend it (ie, you cannot create new derived types from it). That being the case, and if primary constraints could reference sealed types, then you’d be building a generic type whose type argument could only be replaced by *one* specific type. Well, we don’t really need generics to do that, right? I mean, the final result would be the same as creating a non-generic type which would reference directly the sealed type S! In other words, the generic type would end up being non-generic and that just doesn’t make sense!

Primary constraints are mandatory when introducing a generic argument. If you don’t specify one, then Object is used by default. Notice that the following types cannot be used as a primary constraint: Object, Array, Delegate, MulticastDelegate, ValuType, Enum and Void.

Besides referencing a non-sealed type, there are also two other special primary constraints: class and struct. When the compiler finds the class constraint, it knows that the generic type argument will only be replaced by a reference type. In practice, this means that the  generic type argument can be replaced by any class, interface, delegate or array type. On the other hand, when the compiler finds the struct constraint, it understands that the generic type argument will be replaced by a value type. In this case, the generic type argument can be replaced by any value or enum type.

There are some important assumptions the compiler can make from these special primary constraints. Here’s a small example:

public class GenericValueType<T> where T:struct {
    public T Create() {
        return new T( );
    }
}
public class GenericReferenceType<T> where T: class {
    public T DoSomething() {
        T aux = null;
        //some code
        return aux;
    }
}

The previous code compiles without an error. Whenever we constrain a generic type argument to a struct, then the compiler knows that it newing up T is ok because all value types have a default public constructor (and that might not happen with reference types). If you concentrate on GenericReferenceType, then you’ll notice that aux is initialized with the null value. You can only attribute null to reference type variables. Since you’ve constrained T to class, then the compiler knows that this assignment is doable and won’t complain about it (try removing the constraint and see what happens – there is a way to go around it, but I’ll leave it for a future post).

Notice that there’s a small gotcha regarding primary constraints associated with the struct keyword: you cannot replace the generic type argument with the Nullable<T> type. Nullable<T> is a value type (ie, struct), but it gets special treatment from the compiler and the CLR. It’s because of that you cannot replace T with Nullable<T> on GenericValueType<T>. Nullable<T> are an interesting topic and we’ll return to them in future posts.

And I guess this sums it up for primary constraints. In the next post, we’ll go over secondary constraints. Stay tuned for more.

Mar 21

Generics: getting started with constraints

Posted in Basics, C#, CLR       Comments Off on Generics: getting started with constraints

Now that we have basic knowledge about generics, we can proceed and talk about constraints. When we write “generic” code, the compiler must ensure that that code will work for any type which might be used in place of the generic type argument. For instance, take a look at the following code:

public class Test {
    void PrintInfo<T>(T genericObject) {
        Console.WriteLine(genericObject.ToString(  ));
    }
}

How can the compiler guarantee that the previous code will work for any type? (btw, lets ignore the fact that genericObject might be null). In the previous snippet, it’s really easy because we know that Object is the base class of all objects and ToString is one of the methods which is introduced by it. And what if we had the following code:

public class Test {
    void PrintInfo<T>(T first, T second)  {
        //compare both
        var comparison = first.CompareTo( second );
        if( comparison > 0 ) {
            Console.WriteLine("second smaller than first");
        }
        else if( comparison < 0 ){
            Console.WriteLine("first smaller than second");
        }
        else {
            Console.WriteLine("Equals");
        }
    }
}

The idea behind the previous snippet is simple: we’re expecting that T implements the IComparable<T> interface. If that happens, then we can call the CompareTo method over the first instance in order to compare them. As you’ve no doubt have noticed from the previous snippet, the compiler isn’t really happy with the CompareTo call. After all, implementing the IComparable<T> interface is really optional and the compiler can’t really generate IL code from the previous method that works for any type.

And that’s why we have constraints. With constraints, we can limit the number of types that can be specified for a generic type argument. By adding a constraint, we can make sure that the generic type argument will only be replaced by types which fall under certain conditions. So, lets fix the previous snippet so that it compiles:

public class Test {
    void PrintInfo<T>(T first, T second) where T:IComparable<T> {
        //compare both
        var comparison = first.CompareTo( second );
        if( comparison > 0 ) {
            Console.WriteLine("second smaller than first");
        }
        else if( comparison < 0 ){
            Console.WriteLine("first smaller than second");
        }
        else {
            Console.WriteLine("Equals");
        }
    }
}

A constraint is always specified through the where keyword and, in the previous example, it tells the compiler that T can only be replaced by a type which implements the IComparable<T> interface (trying to pass a type which doesn’t implement that interface results in a compiler error). With this small change, we can treat T like an object which implements IComparable<T>. Notice that constraints can be applied where you’re allowed to introduce generic types (ie, you can add constraints at class, interface, delegate or method level).

There are several types of constraints, which are grouped into several categories. In the next post, we’ll take a look primary constraints. Stay tuned for more.

Mar 19

Generic interfaces and delegates

Posted in Basics, C#, CLR       Comments Off on Generic interfaces and delegates

In the previous post, I’ve introduced the concept of generics. In this post, we’ll keep looking at generics and we’ll talk a little bit about how we can create generic interfaces and delegates. Without generic interfaces, manipulating a value type through its interface reference would result in boxing (and less compile type safety). The some goes for delegates: with generics, it’s possible to create a delegate which allows a value type instannce to be passed back to a callback method in a type-safe way without any boxing.

There’s not much to say about the syntax used for creating generic interfaces and delegates. So, here’s an of a generic interface:

public interface IDoSomething<T> {
    void DoItNow(T);
    T Whatgetdate();
}

And here’s an example of a generic delegate (btw, it’s defined by the framework):

public delegate void Action<T>(T item);

Tipically, the compiler will transform a delegate into a class which looks like this:

public class Action<T> : MulticastDelegate {
    public Action(Object obj, IntPtr method);
    public virtual void Invoke(T item);
    public virtual IAsyncResult BeginInvoke(T item, AsyncCallback cb, Object obj);
    public virtual void EndInvoke(IAsyncResult result);
}

(And we’ll see why in future posts Smile)

From .NET 4.0 onward, the type arguments of a generic delegate or interface can be classified as contravariant or covariant. In previous versions, generic arguments were always invariant, ie, the generic type parameter couldn’t be changed after being defined. In C#, contravariant type arguments are indicated through the in keyword. It means that the generic type parameter can change from a class to another derived from it. On the other hand, covariant type arguments are indicated in C# through the out keyword and, in this  case, a generic type argument can change from a class to one of its  base class. An example might help you understand what’s goin here. Suppose we start with this (again, defined by the framework):

public delegate TResult Func<in T, out TResult>(T arg);

In this case, T is contravariant and TResult is covariant. So, if you have the following classes:

class Base {}
class Derived : Base {}

Now,  if you still haven’t forgotten that Object is the base class of all objects, then from .NET 4.0 onward, you can simply do this:

Func<Object, Derived> fun1 = null;//not important here
Func<Base, Base> fun2 = fun1;

Notice that there’s no cast there! this is not the easiest example to understand what’s going on, but I really don’t have time for more (sorry). What matters is understanding that fun1 references a method which expects and object and returns na instance of Derived. On the other hand, fun2 expects a method that receives a Base and returns another Base instance. Since you can  pass a Base to a method which expects an Object and since you can treat a Derived instance as  if it was a Base instance (because Derived extends Base), then the previous code is safe.

Interestingly, the compiler will only be happy with contravariance and covariance if it can infer that there is a conversion between the tpyes being used. That means that variance won’t work for value types because it would require boxing. In practice, this means that this won’t compile:

void DoSomething(IEnumerable<Object> items) { }
DoSomething(new List<Int32>());//ooops:  compiler error

It’s a pity, but that’s just the way it is. Before ending, there’s still time to say that you should always specify the variance of a type argument when creating generic arguments. This doesn’t introduce any nasty side effects and enables the use of the delegate in more scenarios. If you want to know more about variance, then I recommend reading Eric Lippert’s excellent series on the topic. And that’s it for now. Stay tuned for more.

Mar 14

Generics: open vs closed types

Posted in Basics, C#, CLR       Comments Off on Generics: open vs closed types

Before going on, does anyone know how to improve the performance of Windows Live Writer? It’s painfully slow after I’ve updated it to the latest release and it’s upsetting me a lot. I’ve tried to disable everything, but the damn app doesn’t show any improvements…damn…ok, enough bitching…

In the previous post, I’ve introduced generics. In this post, we’ll keep looking at generics and I’ll present the concept of open vs. closed type. Before going on, let’s recover the MyList snippet I’ve introduced in the previous post:

class MyList<T> {
    public void Add(T item) {
    }
    public void Remove(T item) {
    }
    public void Sort(IComparer<T> comparer) { }
}

And then there was some code which used MyList with ints:

var intList = new MyList<Int32>();
intList.Add(10);
intList.Remove(20);

And now, we’re ready to proceed Smile

As you probably know,the CLR uses internal data structures for every type used by an application. These data types are generally known as type objects. Generics are also types and that’s why it shouldn’t surprise you to find out that they too have their own type objects. In other worlds,MyList<T>  is a type and you can get its type object. However, there’s an important difference between MyList<T> and, say, String: you cannot instantiate a generic type without specifying a data type for its type parameter. That’s why we had to pass Int32  when we instantiated MyList in the previous snippet.

Btw, a type with generic type parameters is called an open type. Whenever code  references a generic type and replaces its generic type argument with a specific type, we end up with what is known as a closed type (notice that closed types are only obtained when all generic type arguments are replaced with concrete types; if that doesn’t happen, then you’re just creating a new open type). An interesting way to see the difference between open and closed types is to print the information returned by the typeof operator to the console:

Console.WriteLine(typeof(MyList<Int32>));
Console.WriteLine(typeof(MyList<>));

The previous snippet ends up printing the following:

ConsoleApplication1.MyList1[System.Int32]
ConsoleApplication1.MyList1[T]

The astute reader will probably notice the backtick followed by a number…the number represents the type’s arity, which is used for indicating the number of type parameters of that type. In our example, MyList arity 1 because it only has one type argument. Another interesting conclusion is that the closed type will share eventual static fields with all instances of that closed type (in our case, it MyList<T> introduced a static field, then MyList<Int32> is a closed type with its own static fields which aren’t shared with, say, MyList<String>).

Being a type, it’s also possible to use a generic type (open or closed) as a base type or to build a new generic type which extends an existing type. Before ending, time for one last gotcha: I’ve noticed that some developers prefer to do this:

class IntList : MyList<Int32> { }

instead of instantiating MyList<Int32> directly like I did in the previous snippets. Please, don’t do this. The problem with this approach is that IntList is a different type from MyList<Int32>:

Console.WriteLine(typeof(IntList) == typeof(MyList<Int32>));

The previous snippet prints false because we’re talking about types with different identities! If you prefer the last approach because it saves a few keystrokes, then don’t forget that you can use the var approach or even create an alias through the using keyword. And that’s it for now. Stay tuned for more.

Mar 14

Getting started with generics

Posted in Basics, C#, CLR       Comments Off on Getting started with generics

With generics, the CLR offers us another way to ensure code reuse. If you’re a C++ developer, you might be tempted to see generics as some sort of C++ templates. Even though there are certain similarities, the truth is that there are several important differences. For instance, in C++ the source code must be available for the developer instantiating the template. That is not the case with generics in .NET (there are other differences, many captured here). Before going on, I believe it’s a good time to show some code. When generics were introduced, they solved a really bad problem: how to encapsulate an algorithm and make it generic and type safe at the same time. The best way to understand it is to look at a quick example:

class MyList<T> {
    public void Add(T item) {
    }
    public void Remove(T item) {
    }
    public void Sort(IComparer<T> comparer) { }
}

In the previous snippet, MyList is a class which works with any data type (notice the <T> right after the class definition). T is called a type parameter and you can see  it as a name which can  be used anywhere a data type  is supposed to be used. Since the type parameter was introduced by the class itself (notice that  <T> is declared right after the class’ name), it can be used for fields,method’s parameter and return values. You can even used it as local variables from within the class’ methods. After creating a generic type,you can redistribute it and let other developers reuse it with a concrete type. For instance, here’s how you can reuse the previous class for storing lists of integers:

var intList = new MyList<Int32>();
intList.Add(10);
intList.Remove(20);

If you’ve tried to pass a string to intList’s Add method, you’d get a compile error. So, in .NET, generics are type safe because the compiler will always ensure that only objects compatible with the used data type parameter can be used where objects of that type are expected. If you’ve only started using .NET in the last years, then you probably haven’t noticed the performance improvement gained through its introduction. Before generics, generalized algorithms resorted to the Object type. Unfortunately, that meant that using those classes with value types would always result in boxing operations. This was really bad and you’d also need to use lots of casts to access the values  or objects saved by those classes. Thank god we’ve got generics!

Currently, the framework introduces several utility classes built with generics. For instance, the System.Collections.Generic, System.Collections.Object and System.Collections.Concurrent namespaces introduce several generic collection classes which should always be used whenever you need to work with collections and lists of elements. And yes, nothing prevents you from creating your own generic types. Currently, you can create generic reference and value types, but you cannot create generic enumerated types. Notice that you can even create generic interfaces, delegate and methods (useful for those cases where only the method encapsulates a reusable algorithm). Creating a generic is an interesting process, but we’ll leave that analysis for the next post.  Stay tuned.

Feb 28

Generic properties: you can’t do that!

Posted in Basics, C#, CLR       Comments Off on Generic properties: you can’t do that!

One of the things some people expect to be able to do is create generic properties. After all, properties will always generate getter and setter methods and nothing prevents you from creating generic methods. So why can’t I do something like this:

class Test{
    public T MyProp<T>{get;set;}
}

Well, it’s really a conceptual problem, you see…in theory, a property represents a characteristic of an object. Making it generic would mean that that characteristic would be able to change, but this really doesn’t play well with the theory, right? Anyway, what this means is that if you need to add some generic behavior into a class, then you  should do that by adding methods. Btw, don’t confuse the previous code with reusing the generic parameter defined by a class:

//compiles as expected :)
class Test<T>{
    public T MyProp{get;set;}   
}

In these last post, we’ve created a new generic type (we’ll come back to generics in the next posts)…notice that the type of the property isn’t  able to change after we create a new instance of a concrete type:

var test = new Test<Int32>();
test.MyProp = 10; //always int

I guess it’s time to say something about properties…to be honest, I really don’t like them. They look like fields, but they are methods. This means that some of the assumptions you end up doing when accessing fields aren’t true with properties. For instance, a field is always read/write (that might not happen with a property). Besides that, a property cannot be passed as a ref or out parameter and accessing one might also cause side effects (something which never happens with fields).

Over the years, I’ve seen them been abused by developers…hell, I’ve even abused them myself! Currently, I tend to stay away from them. In my opinion, people using them will end up with what is known as an anemic models (models in which there’s almost no behavior…). Currently, I only use properties in messages and for objects which are supposed  to feed UI forms (since the data binding process only works with them).

And that’s it for now. Stay tuned for more.

Feb 20

More on properties: parameterful properties

Posted in Basics, C#       Comments Off on More on properties: parameterful properties

It’s been a long time since I’ve posted in this blog…and there are couple of good reasons for that. For starters, my PC is gone…the disk died and I won’t be replacing it soon. That means that now I’m using a 2 year old netbook for doing work at home and sometimes it gets pretty frustrating (especially when I need to use VS). Besides that, I’ve also been pretty busy working in my next books. Yes, that’s right: books. The HTML book I’ve mentioned before is almost ready and I’m also actively working in a JavaScript book which covers the ECMAScript5 specification (again, in Portuguese, for FCA). As you can see, I’ve been really busy in my free time . Anyways, I believe that things are more stable now and I think I’ll be able to be start writing more frequently in the next months. The idea is to keep writing about some C#/CLR basic concepts, a little but of JavaScript and (probably) pick up some new framework and start digging into it…

Before going on, I’d like to express my condolences to everyone who lost someone on the 20th February floods which hit us a year ago. Yes, it’s been a year, but I believe that this date will never be forgotten by any of us that were here at the time…

In the last technical post,I as talking about properties. And today,we’ll keep looking at them and we’ll talk about parameterful properties. Parameterful properties (a name which I’ve borrowed from Jeff Richter’s excellent CLR via C#) are properties whose get methods accept one or more parameters and whose set methods accept two or more parameters. In C#, these properties are known as indexers and are exposed as array-like properties. The next snippet shows how one can define a parameterful property and use it from C#:

//for demo purposes only…
public class IntArray {
    private int[] _arr;
    public IntArray(Int32 numberOfItems) {
        if( numberOfItems <= 0 ) {
            throw new ArgumentOutOfRangeException();
        }
        _arr = new Int32[numberOfItems];
    }

    public Int32 this[Int32 position] {
        get {
            if( position < 0 || position >= _arr.Length) {
                throw new ArgumentOutOfRangeException();
            }
            return _arr[position];
        }
        set {
            if (position < 0 || position >= _arr.Length) {
                throw new ArgumentOutOfRangeException();
            }
            _arr[position] = value;
        }
    }
}

var ints = new IntArray(10);
ints[0] = 10;
Console.WriteLine(ints[0]);

In the previous example(built only for demon purposes) ,the  paramertul property expects only one parameter in the getter and two in the setter (in the previous case, the first parameter passed to the getter and setter identifies the position and the second parameter ,passed only to the setter and introduced through the “hidden” value parameter, indicates the  value that is supposed to be put into that position).  As you can see, this is used as special name for a parameterful property. In practice, this means that you cannot use static parameterful properties in C# (even though the CLR does support that).The previous snippet shows some interesting recommendations too: for instance, notice that we’re throwing an exception when someone passes and indexer out of the expected interval.

Another interesting feature of parameterful properties is overriding: unlike parameterless parameters, you can override them by using different parameters. Form the CLR point of view, parameterful properties will also give place to setter and getter methods. In the previous example, that means that the code we’ve written will be transformed into something which looks like this:

public class IntArray {
    public Int32 get_Item(Int32 position){/*previous code here*/}
    public void set_Item(Int32 position, Int32 value){ /*previous code here*/}
}

As you can see, the compiler will automatically pick the name Item for a parameterful property and will prepend the getter and setter methods with the get_ and set_ prefix.

In C#, we never use the name Item when consuming a  parameterful property (the [] operator is used instead). However, if you’re writing a library that will be consumed from other CLR languages, then you can customize the name of this property by using the IndexerNameAttribute:

[IndexerName("Integer")]
public Int32 this[Int32 position] {/*previous code here*/}

By doing this, the compiler will automatically generate a pair of methods named get_Integer and set_Integer,
allowing other languages (ex.:VB.NET) to access this property through the Integer name. Notice that the String  type uses this attribute to let you access a char in a string from languages which don’t use the [] operator to interact with parameterful properties. Since you don’t use a name to refer to a indexer in C#, you’ll only be able to introduce one property of this “type” in your C# code (though, as I’ve mentioned previously, you can override it). This behavior might introduce some problems when you’re trying to consume  in C# a type written in another language which defines more than one parameterful property. For that type to be consumed from C#, it must indicate the name of the default parameterful property through the DefaultMemberAttribute (notice that  the C# compiler does this automatically for you C# types and it  does takes into  account the use of the IndexerNamAttribute). And yes, that will be the only parameter that C# code will be able to access…

btw, and before you ask, languages which don’t support parameterful properties can access them through direct getter and  setter method calls. And that’s it for now. Stay tuned form more.

Jan 03

Properties are one type of the members you can define in a class. It might seem strange, but a property will only allow us to call a method in a simplified way. In other words, you can see them as sugar for invoking methods (which typically interact with a  private field). Nonetheless, they’re important and most programming languages offer first class support for them (including C# and VB.NET). The CLR allow us to define two types of properties: parameterless and parameterful properties. In this post, we’ll concentrate in parameterless properties (and we’ll leave parameterfull properties for a future post). So, let’s get started…

As you know, most objects have state and that state is saved through instance fields. For instance, if we think about a Student class which has info about the name and address of someone, then we’d probably end up building a class which looks like this:

public class Student {
    public String Name;
    public String Address;
}

And then, you could consume it like this:

var std = new Student();
std.Name = "Luis";
std.Address = "Funchal";
Console.WriteLine( std.Name + "-" +std.Address );

There’s nothing wrong with the previous code. However, there are a couple of observations that might prevent you from using the code as-is:

  • Many argue that exposing fields is not a good idea because it violates one of the tenets of OO programming (data encapsulation).
  • There are times where you might need to validate the values that are being set to a field. Publicly exposing a field means that anyone can set that field to any value and there’s nothing you can do about it.

The solution to this problem is simple: make you fields private and add a couple of methods which allow you to get or set the values of those fields:

public class Student {
    private String _name;
    private String _address;
    public void SetName(String name) {
        //you could perform validation here
        _name = name;
    }
    public String GetName() {
        return _name;
    }
    public void SetAddress(String address) {
        //you could perform validation here
        _address = address;
    }
    public String GetAddress() {
        return _address;
    }
}

And then, you’d need to change the consuming code so that it uses to methods to interact indirectly with the fields:

var std = new Student();
std.SetName( "Luis" );
std.SetAddress( "Funchal" );
Console.WriteLine( std.GetName() + "-" +std.GetAddress() );

This new approach solves the previous problems,but it will also make you write more code and you’ll need to use the new “access” methods for interacting with the fields. Since Microsoft saw these disadvantages as problems,they’ve ended up introducing the concept of (parameterless) property. Here’s a new version of our class that relies in properties:

public class Student {
    private String _name;
    private String _address;
    public String Name {
        get { return _name; }
        set { _name = value; }
    }
    public String Address {
        get { return _address; }
        set { _address = value; }
    }
}

And here are the changes made to the consuming code:

var std = new Student();
std.Name = "Luis";
std.Address = "Funchal";
Console.WriteLine( std.Name + "-" +std.Address );

(Notice that all these snippets end up printing the same results.)

Defining a property is simple: you can specify a get and a set method, which encapsulate the code for reading and setting the value of that property. As you might expect, get and set are both optional (though you do need to define at least a set or a get when defining a new non-abstract property): it all depends on whether you intend to allow read (get) or write (set) access to a specific property. You’ve surely noticed the use of the value parameter from within the set method. This parameter passes the value attributed to the property and, in the previous example, was simply copied into the private backing field of the property.

Even though it’s not mandatory, most properties end up manipulating one or more private fields of the class where they’re defined. When that happens, the property is said to have a backing field. So, adding a property in C# results in adding a pair of (get and set) accessor methods (depends on whether you want to allow read and write access in your property definition) and on adding a property definition to that class’ metadata. The getter and setters introduced in the property text definition are transformed into methods (prefixed with the get_ or set_ word) which are invoked whenever you read or write a value into that property. Notice that the CLR relies only in these methods for accessing the property and performing the “current” operation. Nonetheless, other tools can access the metadata information for getting more information about the members of a specific class.

When we’re creating simple read/write properties like the ones presented in the previous example, then we can reduce the amount of typing by relying in automatic properties. Here’s the revised version of our class that uses this type of properties for introducing the Name and Address properties:

public class Student {
    public String Name { get; set;}
    public String Address { get; set; }
}

Whenever you create a new property and don’t define the body of the get and set methods, the compiler will automatically introduce a backing field and will implement those acessor methods for you. These methods will simply return the value of the backing field (get) or update its value to a new one (set).

Notice that creating an automatic property is not the same thing as adding a field. With properties, the calling code will always be redirected to the get or set method (instead of accessing the field directly). The advantage of using automatic properties is that you can change the implementation of the property in the future (for instance, you might need to add validation to the values passed to the set method) and you won’t have to recompile the consuming code (if it’s in a different assembly).

There are some disadvantages regarding the use of automatic implemented properties. For starters, you cannot initialize an automatic property during its declaration (this means you need to put that initialization code into a constructor). Its use is discourage if you’re performing any serialization/deserialization of that class because you have no control over the name of the backing field and that is what gets serialized. Finally, you should keep in mind that when creating this type of property in C#, you’ll need to define a get and a set method (after all, what would be the use of having an automatic property with only a setter if you have no way to retrieve the value?)

And that’s it for now. Stay tuned for more on properties…

Nov 23

In the previous two posts, I’ve presented the basics (and some gotchas) associated with the way you declare events. In this post, I’ll present an alternative way for exposing events which is useful when you’re creating a class which has lots of events. In order to understand how this strategy works, we need to make a small detour and see what the compiler does when it finds an event field. Here’s the code we’ve been using for exposing the StudentNameChanged event:

public class Student {
      public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
}

Whenever the compiler finds this definition, it ends up generating the following code (ripped with .NET Reflector from the compiled assembly ):

public class Student {
    private EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged{
        add {
            EventHandler<StudentNameChangedEventArgs> handler;
            EventHandler<StudentNameChangedEventArgs> handler2;
            EventHandler<StudentNameChangedEventArgs> handler3;
            bool flag;
            handler = this.StudentNameChanged;
        Label_0007:
            handler2 = handler;
            handler3 = (EventHandler<StudentNameChangedEventArgs>) Delegate.Combine(handler2, value);
            handler = Interlocked.CompareExchange<EventHandler<StudentNameChangedEventArgs>>(&this.StudentNameChanged, handler3, handler2);
            if (((handler == handler2) == 0) != null) {
                goto Label_0007;
            }
            return;
        }
        remove {
            EventHandler<StudentNameChangedEventArgs> handler;
            EventHandler<StudentNameChangedEventArgs> handler2;
            EventHandler<StudentNameChangedEventArgs> handler3;
            bool flag;
            handler = this.StudentNameChanged;
        Label_0007:
            handler2 = handler;
            handler3 = (EventHandler<StudentNameChangedEventArgs>) Delegate.Remove(handler2, value);
            handler = Interlocked.CompareExchange<EventHandler<StudentNameChangedEventArgs>>(&this.StudentNameChanged, handler3, handler2);
            if (((handler == handler2) == 0) != null) {
                goto Label_0007;
            }
            return;
        }
    }
}

At first sight, the code might look more complex than it really is. As you can see,an event is transformed into a delegate field and the event property ends up generating two methods (add and remove – btw,in the end, don’t forget that you’ll get two methods named add_StudentNameChanged and remove_StudentNameChanged). The add method is used for subscribing an event, while the remove method is called for cancelling a previous subscription. In order to ensure proper working, the code generated for the add and remove methods rely on the CompareExchange method to solve the problems that might arise when our class is used in a multithreaded application (note: the goto label shown can be seen as a do-while loop which keeps adding the passed in delegate until it succeeds in multithreaded environments). Besides that, you’ll surely notice the use of the Combine and Remove static methods used for adding and removing event handlers (I’ll also have a couple of posts about delegates, so I won’t get into this right now).

Now that we know what happens when we define an event, we can see how we can improve our previous event definition by replacing it with an explicit event implementation where the add and remove methods are explicitly defined. Before showing this, it’s important to understand why we need to use a more efficient approach for classes that expose lots of events. The best way to understand why we need this strategy is to think about classes that wrap GUI controls. If you look at the Control class, you’ll notice that it exposes lots and lots of events. If that class exposed its events by using the first approach, we’d end up with lots and lots of fields and that means a lot of memory (for events which might not even be handled by the dev that is using a control).

To solve this memory usage problem, we need to expose events in a more efficient way. To achieve this, we need to add a custom dictionary to our class and then implement our events explicitly through the add and remove methods. Here’s some code that shows how to do this:

public class Student {
    private String _name;
    public String Name {
        get { return _name; }
        set {
            if (value == _name) return;
            var oldName = _name;
            _name = value;
            OnNameChanged( new StudentNameChangedEventArgs(oldName, _name) );
        }
    }
    private static Object _key = new Object(  );
    private Object _locker = new Object(  );
    private EventHandlerList _events = new EventHandlerList(  );
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged {
        add {
            lock(_locker) {
                _events.AddHandler( _key, value );
            }
        }
        remove {
            lock(_locker) {
                _events.RemoveHandler( _key, value );
            }
        }
    }
    protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
        lock(_locker) {
            var handler = (EventHandler<StudentNameChangedEventArgs>)_events[_key];
            if(handler != null ) {
                handler( this, e );
            }
        }
    }
}

As you can see, we’ve added a couple of fields to our class. Besides the EventHandlerList instance, I’ve also added an object used as a key for identifying the StudentNameChanged event in the events custom dictionary (_events field) and another object used for locking purposes (to ensure proper usage of our class in a multithreading environment). Btw, I’ve ended up using the EventHandlerList class since it’s used for all major UI classes introduced by the .NET framework. If you want, you can build your own custom dictionary which takes care of all the goo related with multithreading and invoking the delegates that handle the event (I’ll leave that for you as an exercise).

And I guess this sums it up quite nicely. There might still be a couple of things to say about events, but I think that these last three posts cover most of the important details nicely, so I’ll end up this series here. However, there’s sill a lot of things to talk about .NET and the CLR, so stay tuned for more.

Nov 23

In one of the previous posts, we’ve looked at the basics associated with .NET events. As promised, we’ll start improving our initial code and today we’ll talk about two topics:

  1. lambdas aren’t always your friends.
  2. we live in a multithreaded world.

Lets start with number 1…In the previous code, I’ve uses something like this to handle the event:

var std = new Student( );
std.StudentNameChanged +=
    ( sender, e ) => Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );

Before going on, you should know that I do use this approach in 90% of the scenarios. The problem with it is that you cannot cancel a previous subscription by using this approach. Here’s some code which I’ve seen people use in the past:

var std = new Student( );
std.StudentNameChanged +=
    ( sender, e ) => Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );
std.Name = "Luis";
std.StudentNameChanged -=
    ( sender, e ) => Console.WriteLine( "{0} — {1}",e.OldName,e.NewName );
std.Name = "Luis2";

The idea of the previous code is simple (though utterly wrong): we subscribe an event through a lambda and then we cancel it by passing the same lambda expression. Well, the problem is that the second lambda expression is different from the first. In this case, the easiest approach is to create a method compatible with the event’s type and then use that to subscribe/cancel the event handler:

static  void PrintName(Object sender, StudentNameChangedEventArgs e) {
    Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );
}

And then, we can simply subscribe/cancel the event like we did in the old days:

var std = new Student( );
std.StudentNameChanged += PrintName;
std.Name = "Luis";
std.StudentNameChanged -= PrintName;

With 1 tackled, lets proceed to 2. It’s safe to assume that multithreading is here to stay and that means writing safer code. Let’s recover the code we used to fire the event:

protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
    if( StudentNameChanged != null ) {
        StudentNameChanged( this, e );
    }
}

Everyone who has had to write multithreaded programs will automatically cringe while reading the previous code. The problem is that between the test and the event execution, the thread can be stopped while another thread removes the existing delegate chain from the event field. And that’s why you’ll typically see the previous code re-written like this:

protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
    var aux = StudentNameChanged;
    if( aux != null ) {
        aux( this, e );
    }
}

The idea is simple: since StudentNameChanged is copied into aux, aux will always reference the same delegate chain that existed at the time of the copy. From that point on, we’ll be using aux for testing and firing the event and we can be sure the aux’s value won’t changed between the test and the execution. Since delegates are immutable, then we’re safe, right?

Unfortunately, we’re not…I’ve used similar code to this for a long time until I’ve learnt that the compiler may optimize (though it currently doesn’t do it) the previous code and simply drop the aux reference. When that happens, we end up with the initial code which exhibits the racing behavior I’ve mentioned before. Bottom line, we need safer code. In these scenarios, the best option I’ve seen is presented by the excellent CLR via C#, by Jeffrey Richter, and consists on using the CompareExchange method:

protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
    var aux = Interlocked.CompareExchange( ref StudentNameChanged, null, null );
    if( aux != null ) {
        aux( this, e );
    }
}

In the previous snippet, CompareExchange will only change the StudentNameChanged event to null *when* it’s null (in other words, it will never change its value if that value is not null). The advantage of using this method is that it will always return a reference to the StudentNameChanged event in a thread safe way. With this small performance hit (yep, there’s a small cost associated with using this method), we’re really safe. As I’ve said before, the compiler doesn’t currently perform the optimization which might break our second version of the code, so you might keep using that approach. Anyway, if you’re writing long-lived code, then you probably should play it safe and go with the more robust version.

In the next post, we’ll still keep looking at events and see how we can improve event declaration for classes that expose lots of events.

Nov 16

Getting started with events

Posted in Basics, C#       Comments Off on Getting started with events

I guess we all know about events, right? Even so, I’ve decided to write a couple of posts about it and today I’ll be talking about some basic stuff associated with event definition and usage. So, what is an event? An event allows a type to notify other objects about something special which happened. If you’re a .NET developer, you know that events are everywhere. For instance, if you look at the Windows Forms’ controls, you’ll see events everywhere (ex.: who hasn’t handled the Click event generated by a Button?).

When a type exposes an event, it means that:

  • it allows another type to register a method which receives future notifications.
  • it allows another type to cancel a previous subscription.
  • it is responsible for notifying all the previous registered methods.

The CLR event model is based on delegates (which allows you to call methods in a type safe way – I guess I’ll return do delegates in future posts). Lets start with a simple example which assumes we have a simple Student type which generates an event in response to a change of its Name property (I’ll call it StudentNameChanges event). Do notice that in the real world I’d simply implement the INotifyPropertyChanged interface to signal this change. Since I want to present all the steps associated with creating an event, I’ll go with my own custom event…

When we expose an event, you must start by deciding if you’ll need to pass custom data to the methods that handle the event. In this case, I’ve decided to pass the old and new name values. In practice, this means that I’ll need to create a new type,derived from EventArgs (this is a convention),which exposes two properties: OldName and NewName.

public class StudentNameChangedEventArgs:EventArgs {
    public String OldName { get; private set; }
    public String NewName { get; private set; }

    public StudentNameChangedEventArgs( string oldName, string newName ) {
        OldName = oldName;
        NewName = newName;
    }
}

As I’ve said, using EventArgs as base is only a convention which you should follow. Nothing prevents you from passing a non-EventArgs type to a method that handles an event (though you’re probably going against what’s expected, which is not a good thing). Now, we’re ready to define the event member. The easies way to do this is to add a public field to our class:

public class Student {
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
}

As you can see, an event is always declared through the event keyword, followed by the expected delegate type. In this case, and since our event args class inherits from EventArgs, we can reuse the EventHandler<T> type. After adding the field, it’s also expected to find a virtual protected method which is responsible for firing the event. Here’s the class’ complete code:

public class Student {
    private String _name;
    public String Name {
        get { return _name; }
        set {
            if (value == _name) return;
            var oldName = _name;
            _name = value;
            OnNameChanged( new StudentNameChangedEventArgs(oldName, _name) );
        }
    }
    protected virtual void OnNameChanged(StudentNameChangedEventArgs e) {
        if( StudentNameChanged != null ) {
            StudentNameChanged( this, e );
        }
    }
    public event EventHandler<StudentNameChangedEventArgs> StudentNameChanged;
}

The OnNameChanged method starts by checking the StudentNameChanged event field. When it’s not null, it will call all interested parties by passing a reference to the Student instance responsible for the event and the custom EventArgs parameter it received. The previous snippet also shows how the event gets generated. As you can see, it will always be generated from the setter of the Name property.

Now, let’s see how we can consume this event from our C# code:

var std = new Student( );
std.StudentNameChanged +=
    ( sender, e ) => Console.WriteLine( "{0} — {1}", e.OldName, e.NewName );
std.Name = "Luis";

Experienced developers will probably detect several things which can be improved in our previous snippets. For instance, using lambda expressions are great, but only  if you don’t need to cancel the subscription. Anyway, I’ll leave this improvements to the next post of the series. Stay tuned for more.