LA.NET [EN]

C#Archive

May 22

Arrays in .NET– part VI

Posted in .NET, Basics, C#       Comments Off on Arrays in .NET– part VI

I’ve ended the previous post saying that there was still one extra mile we could take to improve the performance associated with arrays in .NET. This extra mile has a name: stackalloc. The stackalloc keyword can only be used in unsafe code and it’s responsible for allocating a block of memory on the stack. In practice, stackalloc creates a single-dimension zero-based array of of value type elements (whose fields’ types are not allowed to be reference types).

There are a couple of things you should keep in mind if you decide to use this option:

  • since the memory block is allocated in the stack, then that memory will automatically be reclaimed at the end of the method where it was allocated.
  • you won’t be able to pass this memory block to most of the  methods exposed by the types introduced by the framework.

Having said this, it’s time to show a quick sample:

unsafe {
    //allocate array in the stack
    //big enough for 10 ints
    Int32* ints = stackalloc Int32[10];
    //fill it with 10 ints
    for (var i = 0; i < 10; i++) {
        ints[i] = i;
    }
    //now do something with ints
    //probably pass it along
    for (var i = 0; i < 10; i++) {
        Console.WriteLine(ints[i]);
    }
}

As you can see, it’s not that complicated. The stackalloc keyword is used in place of the new keyword and we end up with a memory block in the stack which is big enough for storing 10 integers.

In the real world, you’ll (probably) use the stackalloc keyword for those scenarios where performance is a must or when you need to use interop to communicate with unmanaged code. In fact, I think that interop is one of those scenarios where stackalloc is a good option.  By using it, you don’t have to allocate an array in the heap (leading to less GC pressure), you don’t have to pin memory (don’t forget that this is a must for passing a reference from a managed object to unmanaged code) and you get automatic clean up on the method’s exit.

Since we’re talking about arrays allocated in the stack, there’s still one more option: fixed size buffers. We’ll take a look at them in the next post! And that’s it for now. Stay tuned for more.

May 22

Arrays in .NET–part V

Posted in .NET, Basics, C#       Comments Off on Arrays in .NET–part V

In the previous post, we’ve looked an non-zero based arrays and I’ve ended the post by talking a little bit about performance. As I’ve said, non-zero based arrays are not really something you want to use regularly. Besides that, we’ve also seen that regular arrays aren’t as performant as jagged arrays because the compiler won’t, for instance,  hoist the index checking outside an eventual loop. Since this index hoisting *does* make a difference, then how can we improve the performance of our code? Well, there is a way for improving the performance, at the coast of safety: I’m talking about using unsafe code for enumerating the items of an array!

When you use this strategy, you’re performing a direct memory access. Notice also that these accesses won’t throw exceptions when you access an “invalid position” (like it happens when you use the “traditional approach”). Instead, you might end with a memory corruption…

In practice, using unsafe code means that the assembly that contains this code must be granted full trust or have the security permission’s skip verification option turned on.  After this basic intro, it’s time to see some code. The next snippet introduces two methods which go through all the items of a rectangular array by using a safe and an unsafe approach:

const Int32 itemsCount = 10000;
static void Main(string[] args) {            
    var ints = new Int32[itemsCount, itemsCount];
    var stopwatch = Stopwatch.StartNew();
    SafeAccess(ints);
    Console.WriteLine("time ellapsed: " + stopwatch.ElapsedMilliseconds);
    stopwatch = Stopwatch.StartNew();
    UnsafeAccess(ints);
    Console.WriteLine("time ellapsed: " + stopwatch.ElapsedMilliseconds);
}
static void SafeAccess(Int32[,] ints) {
    var totalCount = 0;
    for (int i = 0; i < itemsCount; i++) {
        for (int j = 0; j < itemsCount; j++) {
            totalCount += ints[i, j];
        }
    }
}
unsafe static void UnsafeAccess(Int32[,] ints){
    var totalCount = 0;
    fixed (Int32* ptr = ints) {
        for (int i = 0; i < itemsCount; i++) {
            var basePos = i * itemsCount;
            for (Int32 j = 0; j < itemsCount; j++) {
                totalCount += ptr[basePos + j];
            }
        }
    }
}

There’s not much to say about the SafeAccess method. There are, however several interesting observations about the UnsafeAccess method:

  • the method is marked with the unsafe keyword because it uses pointers to access the items on the array (don’t forget that you need to compile this code with the /unsafe option).
  • ptr is a pointer which points to the memory address of the first item of the array.
  • We’re fixing (or pinning) the pointer (by using the fixed statement) to prevent the GC from reallocating a movable variable (since the ints references a managed array, it could get moved during a GC compaction operation).
  • We need to perform the calculations required to access the items of the array. Even though we’ve got a rectangular array, the truth is that it is allocated as single block of items, where one line is appended to the end of the previous one.
  • Finally, notice that we get the value of each value by using an “array index syntax” (similar to the one used in traditional C# code and to the one you can use with unmanaged C or C++)

In my machine, running the previous code resulted in the following output:

safevsunsafe

As you can see, there’s a small difference which can make all the difference in the world for those apps  where you do need that extra performance ounce…Before ending, a couple of observations:

  • as I said, this code needs full trust and there might be some places where your code won’t have it.
  • This strategy can only be used for arrays which hold primitive types, enums or structs whose fields’ type is one of the previously mentioned types.

If you want, there’s still an extra step for improving the performance of code that interacts with arrays, but that will be the topic of the next post. Stay tuned for more.

May 20

Arrays in .NET – part IV

Posted in .NET, Basics, C#       Comments Off on Arrays in .NET – part IV

Today I’m going to to something which I really don’t like: I’m going to show you how to create non-zero lower bound arrays (aka, non-zero-based arrays). Before going on, I must say that I find it hard to justify the creation of these types of arrays. Anyways, since I’m covering arrays, I guess I must really show some code that illustrates its usage in C#. The Array class offers a static CreateInstance method which we can use to create arrays. There are several overloads, but we’re interested in the one which allow us to specify the lower bound and number of elements in each dimension. The following snippet shows how to create a single dimension array which can hold 2 elements on positions 10 and 11:

var ints = Array.CreateInstance(
    typeof(Int32), //type
    new[] { 2 }, //number of elements in each dimension
    new[] { 10 }); //first index in each dimension
ints.SetValue( 10, 10 );
ints.SetValue( 100, 11 );
Console.WriteLine( ints.GetValue( 11 ) );

As you can see, the CreateInstance method receives three parameters:

  • The first, identifies the type of each item that will be stored in the array.
  • The second is an array which indicates the number of elements stored in each dimension. Since we only have one one dimension, we pass only the number of items for that single dimension.
  • The third parameter identifies the starting index (ie, the lower bound) of each dimension (in this case, the array starts at index 10).

Internally, the CreateInstance method will allocate enough memory for the array, it will also keep the information about the bounds and dimensions and it will return an Array instance. As you can see, returning an Array instance means using the instance GetValue and SetValue methods for reading and writing values to our array…at least, that’s what we need to do in C#. Interestingly, we can use the traditional syntax for accessing items of a multi-dimension non-zero-based array:

var ints = (Int32[,]) Array.CreateInstance(
    typeof(Int32), //type
    new[] { 2,4 }, //number of elements in each dimension
    new[] { 10, 20 }); //first index in each dimension
ints[10, 20] = 10;//put value in first position
Console.WriteLine(ints[10,20]);

If you’re creating arrays where you don’t control the lower and upper bound indexes, then you should probably resort to the GetUpperBound and GetLowerBound methods. In the next snippet, I use these methods for enumerating all the items stored in the ints array:

for( Int32 i = ints.GetLowerBound( 0 ),
        firstDimUpper = ints.GetUpperBound( 0 );
        i < firstDimUpper ;
        i++ ) {
    for(Int32 j = ints.GetLowerBound( 1 ),
        secondDimUpper = ints.GetUpperBound( 1 );
        j < secondDimUpper;
        j++ ) {
        Console.WriteLine(ints[i,j]);
    }
}

As I said, using non-zero-based arrays is one of those things you shouldn’t really do. For starters, they have worst performance that the traditional zero-based arrays. And there are several reasons that justify this behavior. In fact, if you’re after performance, then you should use only single-dimension zero-based arrays. For starters, there are special IL instructions that the compiler uses for accessing items in these arrays that will make the JIT emit optimized code. When you’re using a single dimension zero-based array, the JIT can hoist the index checking code in loop, making that validation run only once, at the beginning of that loop.

This hoisting optimization doesn’t happen with non single dimension zero-based arrays (in practice, this means that each array item access will only be allowed after checking if the specified index is valid). Besides that, the JIT will have to subtract the array’s lower bound position to the specified index to get the item you’re requesting (notice that this is also done for multi-dimension zero-based arrays). These issues make me believe that using non-zero based arrays in .NET isn’t really a thing which you want to do often. And I have done just that in these last years…

And I guess that’s it for now. Stay tuned for more.

May 19

Arrays in .NET – part III

Posted in .NET, Basics, C#       Comments Off on Arrays in .NET – part III

Now that you know the types of arrays you can create and how to simplify its initialization, it’s time to see what we can do with then. As I’ve said previously, all arrays inherit from the base Array type. In practice, this means that all array instances we create inherit automatically all the properties and methods defined by the base Array class. This is great because it makes working with arrays easy. Here’s an example:

var ints = new []{1, 2};
var numElements = ints.Length;
var shallowCopy = new Int32[2];
ints.CopyTo( shallowCopy, 0 );

As you’ve probably noticed, you can only use Length and CopyTo because they were defined by the Array type (which is automatically used as a base type of our Int32 array). But there’s more: the Array class introduces several utility methods which you can use to find an element or to sort an array. The next snippet illustrates the use of the static IndexOf static method for getting the position of the first element of an array:

var ints = new []{1, 2, 30, 40};
var pos = Array.IndexOf( ints, 2 );//returns 1

The Array type does expose several static methods, so you probably should take a moment or two to look at its docs…But there’s still more! All arrays implicitly implement IEnumerable, ICollection and IList interfaces. Unfortunately, the Array base class doesn’t implement the generic version of these interfaces. However, the CLR does implement these generic interfaces for all the types in the hierarchy of all single-dimension zero-lower bound arrays that hold *reference* types. So, when we have something like this:

var strs = new[] {"hi", "there"};

The CLR will not only implement the IEnumerable<String>, ICollection<String> and IList<String> on the “type String[]”, but it will also implement the generic versions of those interfaces for all String’s base types (in this case, there’s only Object). Notice that this doesn’t happen for value types (in those cases, you’ll get the generic versions of the interfaces, but only for that concrete value type) because reference and value items use different layout memory placements. Notice also that implementing these interfaces is really important for allowing arrays to be used in LINQ expressions…

And that’s it for now. In the next post, I’ll keep my previous promise and we’ll see how to create non-zero-lower bound arrays. Stay tuned!

May 19

Arrays in .NET – part II

Posted in .NET, Basics, C#       Comments Off on Arrays in .NET – part II

In the first post of the series, I’ve talked about the several types of arrays you can define on your .NET apps. What I didn’t mention at the time is that C# allows you to create and initialize an array with a single instruction:

var ints = new Int32[] {1, 2, 3};

When the compiler finds code like the one shown in the previous snippet, it will automatically create an array big enough for storing all the items specified in the initialization list (in this case, the array’s size will be 3) and it will automatically store the all the items specified in the initialization list in the corresponding index (ex.: value 1 will be stored at index 0). Notice that ints type isn’t specified and it’s inferred from the right hand expression. In fact, we could even save a few key strokes by omitting the Int32 type from the initialization list:

var ints = new [] {1, 2, 3};

When the compiler analyses the previous code, it will try to find the closest base type from all the expressions defined on the initialization list. In the previous example, we only have Int32 literal values, so the compiler will automatically use the Int32 type. If you don’t explicitly set the type of the elements in the array like we did in the previous snippet, then you have to take some care. For instance, you shouldn’t mix reference and value types:

var ints = new [] {"1", 2, 3};//error CS0826: No best type found for implicitly-typed array

This error might confuse novice users at first because we all know that all types inherit (directly or indirectly) from Object, right? So, why doesn’t the compiler simply infers that we want an array of Object elements? Well, I think it could do that. However, that would result in a boxing operation for storing the integer value. Since boxing is a somewhat “dangerous” operation (it might introduce some subtle bugs), then compiler opted for not doing this type of inferring. So, if you do really want to mix reference and value types, you do need to be specific about the type of each element that is going to be stored in the array:

var ints = new Object[] {"1", 2, 3};

Btw, and since we’re talking about shortcuts for initializing arrays, there’s still one final option for cutting down the number of key strokes used:

Int32[] ints = {1, 2, 3};

Unfortunately, this strategy force us to specify the type of variable on the left (notice the use of Int32[]). Since I’m a “var lover”, I really don’t use this syntax a lot. Btw, I‘m still not convinced that forcing a compiler error on something like this is a good idea:

var ints = {1, 2, 3};//error CS0820: Cannot initialize an implicitly-typed local variable with an array initializer

I can understand the decision made about the boxing issue I’ve mentioned before, but why (oh why???) can’t the compiler infer the type only from expression used on the right side of the assignment?

Before ending, it’s also important to understand that the “shortcuts” associated with the reduced “newing” expressions are really useful when we need to work with anonymous types:

var ints = new[] {
    new {Name = "Luis", Age = 35},
    new {Name = "Jose", Age = 20}
};

Without being able to infer the type from each of the expressions defined by the elements of the array, there would be no way for us to create a new array (because we don’t really know the name that will be given to the anonymous type created by the compiler).

And I’d say that’s all for now! Stay tuned for more…

May 19

Cultures, cultures and still more cultures…

Posted in .NET, Basics, C#       Comments Off on Cultures, cultures and still more cultures…

In one of my latest’s post on text and strings, reader John Meyer asks a couple of interesting questions:

Can you discuss any differences between these 2 ways of getting CultureInfo objects?

var culture1 = new CultureInfo(“en-US”);
var culture2 = CultureInfo.GetCultureInfo(“en-US”);

left one out of the previous comment, sorry:

var culture3 = CultureInfo.CreateSpecificCulture(“en-US”);

Before answering the question, I believe that it’s important to understand that cultures are grouped into three types: invariant, neutral and specific cultures. An invariant culture is, as its name says, invariant. You can think of this culture as something which… is nor neutral, nor specific :). The invariant culture’s name is the empty string  (“”) and, by default, it’s associated with the English language (but not with a specific country or region). You should only use this culture when you don’t require culture dependent results (ex.: persisting data that is not displayed to the users). So, you probably shouldn’t use this culture for operations that provide feedback to the user. The following code shows how to get a reference to an invariant culture object:

//both return an InvariantCulture object
var neutral1 = new CultureInfo( "" );
var neutral2 = CultureInfo.InvariantCulture;

In the previous examples of the text series, I believe that all the examples I’ve shown relied only on specific cultures CultureInfo objects. A specific culture is always associated with a language and a country or region. For instance, the following snippet creates a culture for the Portuguese language in Portugal:

var culture = new CultureInfo( "pt-PT" );

As you can see, specific cultures are always associated by a pair of chars which identify the language (“pt”) and the country or region (“PT”). Finally, we need to define the neutral culture concept. A neutral culture is a culture that is only associated with a language. In practice, that means that we only pass the language part during initialization of the object. In the following snippet, we’re creating a neutral culture object for the Portuguese language:

var culture = new CultureInfo( "pt" );

Since we’re discussing the existing CultureInfo classifications, there are a couple of interesting points before we start analyzing the previous methods:

  • The predefined cultures form a hierarchy, where a specific culture (ex.: “pt-PT”) is parented by a neutral culture (ex.: “pt”). The invariant culture is the parent of all neutral cultures. After getting a reference to a CultureInfo object, you can navigate through its hierarchy through its Parent property.Oh, and one more thing: the InvariantCulture parent is…drum roll…the InvariantCulture 🙂 (I’m mentioning this here because initially I thought it should be null…).
  • There are some operations which don’t *quite* work with neutral cultures. For instance, formatting a date will only work as expected when you’re using a specific culture. For instance, the next snippet shows what happens when you use the English neutral culture to format a date in the short date format:
var neutral1 = new CultureInfo( "en" );
Console.WriteLine(String.Format(neutral1, "{0:d}", DateTime.Now)); //5/19/20011
Console.WriteLine(String.Format(new CultureInfo( "en-US" ), "{0:d}", DateTime.Now));//5/19/20011
Console.WriteLine(String.Format(new CultureInfo( "en-GB" ), "{0:d}", DateTime.Now));//19/05/20011

Yes, it compiles and runs…but have  you noticed that the returned result isn’t really the one you’d expect if you live in the UK?

Ok, so I guess we’re ready to go…Lets start by understanding the difference between using the constructor and using the static GetCultureInfo method:

var culture1 = CultureInfo.GetCultureInfo( "pt-Pt" );
var culture2 = CultureInfo.GetCultureInfo( "pt-PT" );
Console.WriteLine(Object.ReferenceEquals(culture1, culture2));//true
var culture3 = new CultureInfo( "pt" );
var culture4 = new CultureInfo( "pt" );
Console.WriteLine(Object.ReferenceEquals(culture3, culture4));//false

I think the previous snippet shows what’s happening…whenever you use a constructor, you end up creating a new object. When you use the static GetCultureInfo method, you’ll end up getting a cached version of a CultureInfo object (if there exists one from a previous call). Notice that I’ve used the static Object.ReferenceEquals to make sure that I was getting the same instance from both method calls. It’s easy to understand that if you’ll be needing several instances of the same CultureInfo object, then you should probably use the GetCultureInfo method. Now, we’re left with the CreateSpecificCulture method…Take a look at the following snippet:

var neutral1 = CultureInfo.CreateSpecificCulture( "pt" );
var neutral2 = CultureInfo.GetCultureInfo( "pt" );

And now, a quick look at what the watch window shows during a debugging session:

culture

As you can see, we use the same neutral culture *string*, but we end up with two different culture “types”. neutral1 references a *specific* CultureInfo instance and neutral2 does, indeed, “point” to a *neutral* CultureInfo object (you can easily test this by accessing the IsNeutralCulture property of the CultureInfo objects). As you can see, CreateSpecificCulture provides us with a way to get a reference to a specific culture object from a neutral culture string (btw, if you pass it a *specific* culture string, you’ll get the corresponding specific culture object). Notice that you don’t have any control over the specific culture returned. You only know that it will return a specific culture and there there are some cases where that behavior might end up confusing your users. I’d say that you can use this method, but only sparingly…

And I guess that’s it for now. Stay tuned for more.

May 18

Arrays in .NET – part I

Posted in .NET, Basics, C#       Comments Off on Arrays in .NET – part I

Currently, the CLR supports single-dimension arrays, multi-dimension arrays and jagged arrays (aka, arrays of arrays). Arrays are always implicitly derived from the Array reference type.Creating a single-dimension array is simple, as you can see from the next snippet:

var ints = new Int32[10];

ints references an array which can hold 10  Int32 elements (ie, 10 integers). Since I was initializing the variable in the declaration, then I used the var keyword to reduce the typing. If I only wanted to declare a variable, then I could have simply written the following:

Int32[] ints;
ints = new Int32[10];

In this case, ints is a variable capable of referencing an array of integers. Notice that ints is null until the execution of the second line. It’s also important to keep in minf that ints refers to a memory space which is allocated in the managed heap (and that means that an array are garbage collected) because all arrays implicitly extend the Array base type.

After making sure that ints references an array, you write and read elements from the array by specifying a position (which is know as index). The next example shows how to do that:

//write value 2 to the first position
ints[0] = 2;
//print 1st value of the array
Console.WriteLine(ints[0]);

CLS compliant arrays are always zero based. In other words, the first element of an array is always placed on position 0. Requiring that all arrays to be zero based allows easy sharing of arrays between CLS languages. Since these single-dimension arrays are the most common type of array used in applications, forcing them to be zero-based has allowed MS to optimize its runtime for its use. Notice that I said CLS, not CLR. In fact, the CLR allows you to create non zero based arrays, but you should be prepared for a small performance hit (I’ll return to this topic in a future post).

As I’ve said at the start, you can also have multi-dimension arrays. Here’s a quick example:

//two dimension arrays of 10×10
Int32[,] ints = new Int32[10, 10];
//put 10 in 1st position
ints[0, 0] = 10;

As you can see, it’s simple: you specify the number of elements in each dimension. Notice also the syntax used for the declaration of a multi-dimension array: the comma “specifies” the number of dimensions (if it were a three dimension array, then you’d use two commas like this Int32[,,]). We still need to look at the declaration and use of jagged arrays. In practice, a jagged array is an array of arrays:

Int32[][] ints = new Int32[2][];
ints[0] = new Int32[2];//2 elems array
ints[1] = new Int32[4];//4 elems array
//put something in one of the jagged arrays
ints[0][0] = 1;

As you can see, the [] [] syntax makes it clear that this is an array of arrays. Notice also that, unlike the multi-dimension array, each “dimension” can have a different number of elements (in the previous example, there are 2 elements in the “first dimension” and 4 on the second).

One final note before ending: the CLR ensures that you will only access a valid position of an array. For instance, if you created an array with 20 elements, you’ll end up with an exception when you try to access an element outside of the [0,19] interval. This is a good thing because it ensures that you will always read an element from the array. There is, however, a small cost associated with this strategy. If you think this is really too much, then you can disable it by accessing the array elements through unsafe access (more details on a future post).

And that’s it for now. In the next posts we’ll keep looking at arrays. Stay tuned for more.

May 18

Zebra code again…

Posted in C#       Comments Off on Zebra code again…

After I’ve posted my initial code, I’ve went ahead and I’ve done some refactoring so that it would also support some of the ZPL commands (here’s the code). Notice that this isn’t really the latest version (you’re probably looking at a version which has, at least, 4 years) of the project but it has enough code for getting you started. Why haven’t I posted this earlier? Well, I don’t remember…the truth is that I’ve found if yesterday after searching for the code after receiving an email from a reader…

May 17

So, you know everything about text, right? – part XV

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right? – part XV

In the previous posts, we took a deep dive into how we can format objects into strings. In this post, we’ll see how to obtain an object from a string (a process known as parsing). By convention, all the types that can parse a string offer a static method named Parse which (at a minimum) expects a string as a parameter. Currently, many of the times introduced by the framework are able to do parsing. For instance, the next snippet shows how to parse an integer from a string:

var intInStr = "100";
var aux = Int32.Parse( intInStr );
Console.WriteLine(aux);

Besides the “simple” Parse method, Int32 offers other overloads of this method:

public static int Parse(string s, NumberStyles style)
public static int Parse(string s, IFormatProvider provider)
public static int Parse(string s, NumberStyles style, IFormatProvider provider)

If you’ve been following the series on text and strings, then you should probably understand what these parameters do. The first parameter (string s) identifies the string which is going to be parsed. NumberStyles is a flags enum which determines the styles permitted in the numeric strings (for instance, AllowExponent indicates that the string can contain a numeric value in exponential form). Finally, IFormatProvider references an object which the parse method can use to obtain culture specific information.The Parse method throws an exception when the passed value doesn’t match the expected numeric string:

//to make it work, you need to pass
//at least the NumberStyles.AllowExponent
var aux = Int32.Parse( "10e2");

If you run the previous code, you’ll end up with an exception because (by default) the simple Parse call uses NumberStyles.Integer for the style parameter. The solution to the problem is simple: pass the NumberStyles.AllowExponent value to the style parameter. Don’t forget that the IFormatProvider plays also an important role in the parsing. For instance, take a look at the following code:

var aux = Double.Parse( "10,2", NumberStyles.Any, new CultureInfo("pt-PT"));

Specifying the pt-PT culture is necessary for getting the double value 10.2. If you had passed the en-US culture, then you’d end up with the value 102 because the char ‘,’ isn’t used as decimal separator in that culture (I believe that in en-US the char ‘,’ is used as a thousands separator).

Since we’re talking about parsing, it’s important to mention that the DateTime type introduces a ParseExact method (besides the traditional Parse method). This method was added to the API of the type because several developers complained about the forgiveness of the original Parse method (after all, parsing dates isn’t really a walk in the park).

But the parsing of DateTime string values wasn’t the only thing developers complained about. Many people have also raised their voice against the implementation followed by the Parse methods. The problem is that there are some apps which need to receive lots of input from the users and if parsing ends up throwing lots of exceptions, then you’ve just degraded the performance of your app. To avoid a breaking change, MS added a new method for the parsing API recommendation: the TryParse method. Here are the overloads introduced by Int32:

public static bool TryParse(string s, out int result)
public static bool TryParse(string s, NumberStyles style, IFormatProvider provider, out int result)

The method returns true when the conversion is performed successfully. In that case, result ends up with the parsed integer.

And I guess that’s it for now. Stay tuned for more.

May 16

In the previous post, we’ve looked at the specificities associated with the usage of the IFormattable interface. As we’ve seen, its ToString method expects a format string and an IFormatProvider instance which allows any interested party to get a reference to an object that can be used for formatting and parsing (more on this in future posts). In this post, we’ll take a look at how we can create custom formatter objects. In this area, we’ve got a couple of options:

  • we can create a custom CultureInfo object. This is a useful approach when none of the cultures defined by the framework can quite be applied to an existing scenario.
  • we can create a custom ICustomFormatter instance. This is a good option when we’re only interested in customizing the way that a specific type is formatted.

Let’s start with the first strategy: creating a custom CultureInfo object. As I’ve said, the framework introduces several predefined cultures which define several important characteristics associated with text, dates and numbers formatting and parsing. Even though I never had to create a custom culture, the truth is that I have some friends which did it because none of the existing cultures covered all their needs. Fortunately for us, the framework introduced the CultureAndRegionInfoBuilder for helping us in the creation of a new culture. Currently, the process involves several steps:

  1. Create a new instance of CultureAndRegionInfoBuilder.
  2. If the new culture is based on an existing one, them use the LoadDataFromCultureInfo and LoadDataFromRegionInfo for initializing the data associated with an existing culture and region.
  3. Modify the properties you wish to customize.
  4. Register the new culture by invoking the Register method.

The registration process is interesting. It starts by creating a .npl file with the information defined by the CultureAndRegionInfoBuilder which is then stored in the %windir%\Globalization folder. After that, it updates the framework’s configuration so that it searches for cultures in the %windir%\Globalization folder instead of relying on the internal cache. As you’ve probably inferred, this process requires administrative privileges (though there is a workaround for non-admins). Here’s a small sample which creates a new culture that is based on the PT culture (it simply replaces the default decimal separator):

var cultureBuilder = new CultureAndRegionInfoBuilder("ptTest", CultureAndRegionModifiers.None);
cultureBuilder.LoadDataFromCultureInfo(new CultureInfo("pt-PT"));
cultureBuilder.LoadDataFromRegionInfo(new RegionInfo("PT"));
cultureBuilder.NumberFormat.CurrencyDecimalSeparator = ".";
cultureBuilder.Register();

After registering the new culture, you can create new custom CultureInfo objects as you normally do, ie, by passing its name to the constructor call.

var money = 10.0;
money.ToString("C", new CultureInfo("ptTest"));//10.0
money.ToString("C", new CultureInfo("pt-PT"));//10,0

As you can see, the ptTest culture uses the ‘.’ as a decimal separator. Before moving on, it’s important to understand that registered cultures can be used in other apps. They even survive eventual reboots of the machine (at least, until you remove the .npl file by calling the static Unregister method).

There are,however, times where you’re only interested in customizing the formatting applied to a specific type. In theses cases, you can implement the ICustomFormatter interface. Typically, the type that implements this interface will also implement the IFormatProvider interface. To illustrate this technique, let’s suppose that we want to return a value in the form [XXX] when XXX is an integer value. Here’s the code I’ve used for the formatter:

class Int32Formatter:IFormatProvider, ICustomFormatter {
    public object GetFormat(Type formatType) {
        //if Int32, return referenc to ICustomFormatter, ie, this
        if( formatType == typeof(ICustomFormatter)) {
            return this;
        }
        //not int: return default formatter
        return Thread.CurrentThread.CurrentCulture.GetFormat(formatType);
    }

    public string Format(string format, object arg, IFormatProvider formatProvider) {
        var val = arg as IFormattable;
        if(val == null ) {
            //does not implement the IFormattable interface
            //call inherited ToString
            return arg.ToString();
        }

        var str = val.ToString(format, formatProvider);
        if(arg.GetType() == typeof(Int32)) {
            return "[" + str + "]";
        }
        return str;
    }
}

And now, we can test our code by using an overload of the String.Format method (or by calling the AppendFormat or …):

var someValue = 10;
var someDouble = 10.0;
var str = String.Format(new Int32Formatter(),
    "Here's an int value: {0} and a double: {1}",
    someValue,
    someDouble);
Console.WriteLine(str);

Now, how does that work? It’s not that complicated…Internally, the String.Format method relies on the StringBuilder’s AppendFormat method to do all the work. This method starts by checking if a formatter was passed and if it offers an ICustomFormatter  object (it does this by calling the IFormatProvider’s GetFormat method). If it does, then AppendFormat ends up calling the ICustomFormatter’s Format method.

Notice that the Format implementation will be called for all placeholder values defined in our string. And that’s why we need to check all the values we receive in the Format method. If the current value doesn’t support the IFormattable interface (required for allowing formatting of the values according to a specific format and culture), then I’ll simply return the result of the instance parameterless ToString method). On the other hand, if the object does implement the IFormattable interface, then I start by calling the IFormattable’s ToString method (and passing it the format string and the format provider). After getting the formatted string, I check for the type of the object. When it’s an integer, I wrap the format string with [ and  ] and return it. If it’s not an integer, I return the previously formatted value. And there you go: a custom provider which is only responsible for customizing the way an integer is formatted in a string.

And I guess this is it for now. On the next post, I’ll take a close look at the inverse process: parsing. Stay tuned for more.

May 16

So, you know everything about text, right? – part XII

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right? – part XII

Even though I’ve said that the previous post would wrap up this series, the truth is that there are still a couple of things I’d like to add in this text. Today, I’ll talk a little bit about the String.Format method which allows us to build strings from many formatted objects. Lets start with a simple example:

var str = String.Format( "{0} was born on the {1}",
                            "Luis",
                            new DateTime( 1976, 06, 27 ) );
Console.WriteLine(str);

The Format method receives a format string (1st parameter) which identifies replaceable parameters through numbered braces. You can use as many placeholders (defined by {position}) as you need. In our example, we’re using two placeholders: the first receives a string and the second is filled with a DateTime value. Printing the string str returns “Luis was born on the 27-06-1976 00:00:00”. If, like me, you’re running the previous code in a thread which is not using the “en-US” culture, you might find the previous result surprising (btw, if you do run it with the “en-US” culture, you end up getting “Luis was born on the 6/27/1976 12:00:00 AM”).

To understand why this happens, we have to take a look at the Format method internals. Internally, the Format method delegates the important work on the StringBuilder’s AppendFormat method, which ends up calling the ToString method for each of the placeholders values. By default, the ToString method will always format the object according to current culture applied to the thread (we’ll return to this topic in a future post). Notice that the ToString method is inherited by all types in the framework because it’s defined by the base Object class. However, its use is rather limited (ex.: it will always format the value according to the culture applied to the current thread). And that’s why the framework introduced the IFormattable interface. This interface, which is implemented by many types exposed by the framework, introduces a single method (also called ToString) which relies on a format provider (an instance of the IFormatProvider passed through the second parameter) for applying a specific format (passed through the first string) to the current object. This is a rather interesting topic and we’ll come back to it in the next post. For now, the important part is understanding that all the values that replace placeholders in a String.Format method call will have its ToString method called (and that ToString method can be the IFormattable’s ToString method, when the type implements that interface, or the “general” ToString method introduced by the base Object type).

Going back to the Format method, you should now that you can change the way formatting is applied to the placeholder values:

//customizing the way dates are presented
var str = String.Format( "{0} was born on the {1:dd/MM/yyyy}",
                            "Luis",
                            new DateTime( 1976, 06, 27 ) );
Console.WriteLine(str);

As you can see, the placeholder used for the DateTime parameter is a little more complex: besides indicating the position for the value which will be applied to that string, it also specifies the formatting that should be applied to that value (in this case, we’re using a custom string which formats the value on the form day/month/year). Whenever we use a placeholder on the form {position:format}, the method will try to call the IFormattable’s ToString method, passing it the format string. Currently, there are several format values which you can pass in the format position of a placeholder. For instance, if you’re interested in formatting numeric types, then this link will be of interest for  you (for getting a general reference on formatting, then check this article).

As you can see, the most important part of the work is done in the ToString method. Since we’ll be coming back to that in the next post, it’s time to wrap this post with a couple of important conclusions:

  • Whenever you need to format several objects into a string, then you should use the String.Format method. This is an efficient option since it relies on the StringBuilder’s AppendFormat method.
  • You can define the formatting of the value of a placeholder by using a placeholder on the form {position:format} (don’t forget that formatting is heavily influenced by the culture and the formatter – we’ll return to this topic in the next post).
  • Besides String.Format, there are other useful methods which rely on StringBuilder’s AppendFormat method. For instance, in console apps, there are overloads of Write and WriteLine methods which end up using a StringBuilder and its AppendFormat method for ensuring good performance in the formatting of several values.
  • Even though we didn’t show it, the truth is that there are overloads of the Format and AppendFormat methods which expect an instance of an IFormatProvider object (which is used for providing formatting information). When you pass an instance of this object, then all calls of the ToString method of the placeholder receive this instance.

And I guess that’s it for now. Stay tuned for more.

May 13

So, you know everything about text, right? – part XI

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right? – part XI

To wrap up this series of posts about text, we’ll talk about best pratices for handling string concatenations. Since string objects are immutable, concatenating lots and lots of strings might be really bad for the performance of your application. Before going on, it’s important to understand that string concatenation *might* only be problematic at runtime. In  practice, this means that we don’t need to worry about literal string concatenations because the compiler is clever enough to build a final string from them. Here’s an example of what I’m saying:

//don't worry about this because the compiler is smart enough
//to build a unique string and embed it in the assemby metadata
var str = "This is a rather large"
            + "string which was broken to improve readability";

Whenever the compiler sees something like that, it will build the string “This is a rather large string which was broken to improve readability” and embed it in the metadata of the assembly. What we really want to solve are scenarios like the following:

//this is bad!
var watch = new Stopwatch( );
watch.Start(  );
var str = "";
for(var i = 0; i < 100000; i++) {
    str += ".!.";
}
watch.Stop(  );//around 15secs
Console.WriteLine(watch.ElapsedMilliseconds/1000.0);

In my machine, it took around 15 secs to concatenate 100000 strings. This is not a good thing. Since strings are immutable, that means that each iteration ends up creating a new string which needs to allocate memory space for all the chars. The previous code is pure evil and in these cases, you should really use the StringBuilder class:

//this is way better!!!!
var watch = new Stopwatch( );
watch.Start(  );
var str = new StringBuilder( );
for(var i = 0; i < 100000; i++) {
    str.Append( ".!.");
}
var final = str.ToString( );
watch.Stop(  );
Console.WriteLine(watch.ElapsedMilliseconds/1000.0);

Now, I’m not going to say how much time this operation took (try it:)), but I will say that it was really really fast…Proficient developers know that we could improve the performance of the previous code by setting the total size of the buffer during the StringBuilder instantiation (notice that in the previous code it’s fairly easy to know the required sized of the buffer).

When developers that are getting started look at the previous examples, they think that they probably should be using StringBuilder instances everywhere…Here’s some code I’ve found before:

var firstName = "Luis";
var lastName = "Abreu";
var bld = new StringBuilder( );
bld.Append( firstName );
bld.Append( "-" );
bld.Append( lastName );

In the previous snippet I’ve introduced variables to simulate receiving input from the user. The first thing you should notice is that we’re just concatenating 3 strings. That’s it! In these cases, you should simply concatenate the strings:

var firstName = "Luis";
var lastName = "Abreu";
var total = firstName + "-" + lastName;

Whenever the compiler sees that you’re using the + operator to concatenate strings, it will automatically transform that into a Concat method call:

var firstName = "Luis";
var lastName = "Abreu";
var total = String.Concat( firstName, "-", lastName );

There are several overloads of the Concat method. The most interesting thing about this method is that it will start by creating a buffer big enough for copying all the strings it receives, so you’ll only have one string instantiation (unlike the initial code shown in the for loop earlier).

What I’m trying to say is that you should *only* use the StringBuilder type when you’re doing lots and lots of concatenations inside a loop (ex.: getting chars from the user). When you’re working with a small number of strings variables, then you should use the String.Concat method (or the + operator, if you think it improves readability).

And that’s it for now. Stay tuned for more.

May 12

So, you know everything about text, right?–part X

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part X

This is becoming a large series on strings…the good news is that there are still several interesting things to say about them, so this will be another “text post”. Today, we’ll take a quick look at how we can secure a string. As you know, there are lots of times where a string contains sensitive data (ex.: passwords). By default, string contents are saved in a char array maintained in memory which can be accessed by some other “naughty” code which might be snooping around the process address space of an application.

If this kind of behavior bothers you (or if you’re developing an app where sensitive data must conforms to more tight security rules), then you should probably rely on the SecureString type. SecureString instances store their string contents as encrypted chars on an unamanged memory block (so that the garbage collector isn’t aware of that memory). Currently, you can insert (instance method InsertAt), append (AppendChar), set (SetAt)or remove (RemoveAt) a char from encrypted array maintained in that unmanaged block. To illustrate its use, lets look at some code:

var secure = new SecureString();
ConsoleKeyInfo keyPressed;
do {
    keyPressed = Console.ReadKey(true);
    secure.AppendChar(keyPressed.KeyChar);
} while (keyPressed.Key != ConsoleKey.Enter);

In the previous snippet, we use the ReadKey to read key presses from a console app. Each key is added to the SecureString instance until the user presses the ENTER key. Notice that the instance methods used for manipulating the secure string stored chars’ need to perform several steps which don’t really “perform” very well. The desired operation is always preceded by a decryption and followed by an encryption to ensure that the internal chars are “safe” again. As you can see, making lots of instance method calls to a SecureString instance might have a negative impact in the performance of your app.

Even though the previous snippet doesn’t show it, the SecureString class implements de IDiposable interface. By doing that, you can easily destroy the unmanaged buffer when you’re done with a SecureString instance (btw, the unmanaged buffer is zeroed before being released). And that’s why you should always dispose of SecureString instances when you’re done with them!

If you look at the API of the SecureString class, you’ll notice that it doesn’t really provide you with an easy way to recover a string instance from it. And that’s by design…and it’s really a good thing Smile If, by any chance, you think that you need to interact with the SecureString string, then don’t!

If you really really really…really need to do that, then the solution is to use the Marshal class. It also means writing unmanaged code, so get ready for seeing some pointers in action (and compiling your project with the /unsafe option):

//PLEASE: don't do this!!!
unsafe static String GetString(SecureString str) {
    StringBuilder bld = new StringBuilder();
    Char* ptr = null;
    try {
        ptr = (Char*) Marshal.SecureStringToCoTaskMemUnicode(str);
        var pos = 0;
        Char current;
                
        while( (current = ptr[pos++] ) != 0) {
            bld.Append(current);
        }
    }
    finally {
        if(ptr != null) {
            Marshal.ZeroFreeCoTaskMemUnicode((IntPtr)ptr);
        }
    }
    return bld.ToString();
}

 

Nasty…to be honest, I really didn’t miss having to use all those pointers…but that’s what is needed to read the chars stored by a SecureString instance. Once again, don’t do this in real world projects!

Unfortunately, SecureString support is limited in the current API of the classes defined by the .NET framework. You can use them with certificates (X509Certificate class), when constructing an event log (EventLogSession class) or when using the PasswordBox control…and that’s pretty much it….I’m hopping to see them used in more places in future releases of the framework, but for now, that’s all we’ve got (I probably missed one or two classes, but I guess that’s all).

And I think this sums it up quite nicely. Stay tuned for more!

May 03

So, you know everything about text, right?–part VIII

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part VIII

As we’ve seen, all chars are represented by 16 bit Unicode values. If you’re a win 32 programmer and you’ve been lucky enough to go “managed”, then I bet nobody is as happy as you because this means that you no longer have to write that lovely code for converting between MBCS and Unicode, right? Unfortunately, there are still times when we do need to encode and decode strings. For instance, if we need to send a file for a specific client, we might need to encode the string. If you don’t know anything about encodings, then this primer by Joel Spolsky is a fantastic read!

By default, and if we don’t specify an encoder, all encodings operations end up using the UTF-8 encoder. With UTF-8, characters can be encoded with 1, 2, 3 or 4 bytes. Since characters below 0x0080 are encoded with a  single char, this type of encoding tends to work well with chars used in the USA. European languages tend also to use chars between 0x0080 and 0x07FF, which require the use of 2 bytes. East Asian languages characters require 3 bytes and surrogates pairs will always be encoded with 4 bytes.

Even though UTF-8 is a popular encoding, it’s not that efficient when you need to encode characters above the 0x07FF char. In those cases, using UTF-16 or UTF-32 might be a better option. With UTF-16, all characters require 2 bytes. In practice, this means that you won’t get any compression at all (like you do when using UTF-8 with chars below 0x0080), but the operation should be fast (after all, this is a “direct copy” of a .NET char because they’re represented with 2 bytes too!). UTF-32 encodes all chars as 4 bytes. Even though it uses more space, it will simplify the algorithm used for traversing the chars because you don’t have to worry with surrogate pairs.

.NET does expose two other predefined encoders: UTF-7 and ASCII. UTF-7 uses 7 bits to encode a char and it should only be used if you have legacy systems which require this format. ASCII encodes a char into an ASCII character (no surprise here!) and you need to be careful because you might end up loosing chars when you use it (chars greater than 0x07F can’t be converter and are discarded during the encoding).

Besides this encoders, you should also know that you can encode any char to a specific code page (if you do, then keep in mind that you might end up loosing chars if they can’t be represented in that code page). In practice, you should always work with UTF-16 or UTF-8. The only excuse to use one of the other encoders is if you have to work with legacy systems. And I guess this covers up the theory. In the next post, we’ll take a look at some code. Stay tuned for more!

May 02

So, you know everything about text, right?–part VII

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part VII

If you’re a reader of this blog, then you probably know that I’m Portuguese. Aside from accentuated chars and the notorious ç, there really aren’t any issues associated with the fact that .NET stores chars in 16 bits memory spaces. In other words, I’m a lucky bastard Smile 

Before going on, a disclaimer: generally, I prefer to blog about areas which I’ve used in my daily activities. Unicode surrogates aren’t really one of those things. However, since I’m writing about text, I believe that the series wouldn’t really be completed without mentioning surrogates. If you do have experience in this area and you do detect a nasty error or an erroneous assumption, then please do use the comment section for correcting meWinking smile Having said that, let’s proceed…

If I had to write Arabic, I wouldn’t be so lucky because those 16 bits aren’t enough for representing all the existing Arabic characters…in these cases, it’s usual to use two 16 bits codes to achieve a single Unicode char. In this scenarios, it’s usual to say that the Unicode char is represented by a high surrogate (the first 16 code value) and a low surrogate (the last 16 code value). If you do need to work with surrogates, then you’ll probably need to resort to the StringInfo class if you need to iterate through the Unicode chars (typically, you’ll refer to each Unicode char as a text element or grapheme). The easiest way to use a StringInfo object is to pass it a string during instantiation. The following snippet illustrates the typical use of this class to enumerate its text elements:

var sb = new StringBuilder();
var s = "a\u0304\u308bc";
var si = new StringInfo(s);
for( var i = 0; i < si.LengthInTextElements; i++ ) {
    sb.AppendFormat("element at {0} is {1}\n",
        i,
        si.SubstringByTextElements(i, 1));
}
MessageBox.Show(sb.ToString());//console won't show it correctly!

It’s important to notice that the StringInfo class is defined in the System.Globalization namespace. After adding a reference to it, you can use the LengthInTextElements property to check the number of text elements. After knowing the current number of text elements, you can use the SubstringByTextElements method to extract the desired portion of text elements.

If you want, you can also get a TextElementEnumerator instance from the GetTextElementEnumerator method: after that, it’s really easy to iterate through the abstract Unicode code chars. Here’s a snippet which illustrates this strategy:

var sb = new StringBuilder();
var s = "a\u0304\u308bc";
var unicodeEnum = StringInfo.GetTextElementEnumerator(s);
while( unicodeEnum.MoveNext()) {
            sb.AppendFormat("element at {0} is {1}\n",
        unicodeEnum.ElementIndex,
        unicodeEnum.GetTextElement());
}
MessageBox.Show(sb.ToString());//console won't show it correctly!

Finally, you can also use the ParseCombiningCharacters method to obtain an Int32 array. The length of the array specifies the number of text elements and each array’s element identifies the index of the string where the first code for each text element can be found. Here’s a small snippet which shows how to use this method:

var sb = new StringBuilder();
var s = "a\u0304\u308bc";
var textElements = StringInfo.ParseCombiningCharacters(s);
for( var i = 0; i < textElements.Length;i++) {
    sb.AppendFormat("char {0} starts at pos {1}\n",
                    i,
                    textElements[i]);
}
Console.WriteLine(sb.ToString());//console won't show it correctly!

Not really as interesting as the first two approaches, but still useful…And I guess this wraps it up for today! Stay tuned for more.

Apr 26

So, you know everything about text, right?–part VI

Posted in .NET, Basics, C#       Comments Off on So, you know everything about text, right?–part VI

In this post, we’ll talk a little bit about string interning. As I’ve pointed out before, strings are immutable. So, it’s fair to say that the following snippet ends up wasting memory because we should end up with two different instances that, for all practical purposes, represent the same string:

var str1 = "hi";
var str2 = "hi";

From a memory usage point of view, wouldn’t it be great if we could make both variables reference the same String CLR object? The answer to this lies in string interning. When the CLR is initialized, it will automatically create a private hash table, whose entry keys are strings and which holds String objects created and maintained in the managed heap. This technique might lead to some performance improvements when you know *for sure* that your app is supposed to work with lots of “equivalent” strings.

Currently, the String class introduces two static methods related with string interning:

var internedString = String.Intern("hi");
var str = String.IsInterned("hi");//checks if "hi" is interned

Both methods receive a string. The first (Intern) checks the private CLR table for a match. If an identical string already exists, it returns a reference to that string. When that doesn’t happen, it performs a copy of the passed string, adds it to the private table and returns that instance. The IsInterned method might not work as you’re expecting…like the Intern method, it will also take a string which is used to perform a look up in the private CLR’s hash table. If there’s a matching entry, it will return a reference to that string. If that isn’t the case, then it will simply return null.

If you’re an experienced developer, you’re probably seeing a big problem with string interning: there’s no way to free the memory used by the strings maintained in the private hash table…well, to be honest, there is one (but that will probably be too drastic for your app): you need to unload the default AppDomain and that will only happen when you kill your app’s process. see? I told you it would be a bit drastic for most apps Smile

By now, you’re probably wondering if  the CLR performs string interning by default. And the answer is yes, but *only* for all literal strings defined in the assembly metadata. In theory, you should be able to control this behavior. Even though .NET supports the CompilationRelaxationsAttribute since version 2.0, the truth is that the CLR v4.0 will ignore the use of that attribute. You can test this by building the following code with the [CompilationRelaxations(CompilationRelaxations.NoStringInterning)] applied to it (hint: you don’t  really need to add it because the C# compiler automatically adds the attribute for you):

var str1 = "hi";
var str2 = "hi";
var sameString = Object.ReferenceEquals(str1, str2);

To be honest, string interning seems really great in theory, but I’m still not sure about its use in the real world. Even though I’m not the most experienced developer in the world, the truth is that I never had to  use it explicitly in my apps. MS seems to think that it might impact your code in a bad way or it wouldn’t have introduced the CompilationRelaxationsAttribute for allowing you to control it (even though it seems like the CLR won’t respect the use of that attribute in most of the scenarios). I’d say that if you do need to work with lots of strings, then you should consider it…but don’t forget to measure to see if it’s really improving your app. And that’s it for now. Stay tuned for more!

Apr 25

In the previous post I’ve said that we needed a small detour into cultures to conclude the linguistic string comparison section. In .NET, the CultureInfo type holds all the information regarding a specific culture. For example, any instance of CultureInfo will give the name or calendar of a specific culture. In .NET, each culture is identified by a unique name which identifies the language/country pair (if you follow this link, you’ll get all the info about how those pairs are built). In practice, this means that “pt-PT” refers to the Portuguese culture in Portugal (which is quite different from the one used, for ex., in Brazil). Here’s an example of how you might end up creating an instance of this type:

var culture = new CultureInfo("pt-PT");

Every thread has two properties which reference culture objects: CurrentUICulture and CurrentCulture. The first is used to obtain resources which are presented to the user (ex.: this object is used to load the appropriate resource file – that is, if you’re using resource files. if you’re not, shame on you!). By default, any created thread will always reference an object compatible with the language of the installed OS (in MUI, you can change the current culture through the Regional and Language Options in the control panel applet).

The second culture property (CurrentCulture) is used for all the other things which CurrentUICulture isn’t used for (ex.: number formatting, string comparing, etc.). The initial value of this property is influenced by the selected value of the Regional and Language option of the control panel applet. It’s common for both properties to reference the same CultureInfo object. This isn’t obligatory, of course…for instance, nothing prevents us from adapting the user interface to a language while formatting info according to some other culture (a good example of this is a web site which adapts its buttons and labels for different languages – or cultures – and will always format the values according to the “en-US” culture because it needs to bill in dollars). To achieve this, you’d have to set those properties to the adequate CultureInfo objects. But we’re  digressing…

For our string discussion, what matters is understanding that the CurrentCulture refers to a CultureInfo object that influences the comparison operations performed over strings. And to do this, the CultureInfo object uses a CompareInfo object which knows how to sort characters. In my mother language (which, btw, is Portuguese) there really isn’t any interesting gotcha (at least, that I can remember). However, that is not the case with German, where ß has the same value  as ss. In practice, this means that (and please pardon my lack of German knowledge, so that might not be the correct way to write football) the following snippet might get you by surprise:

var str1 = "fussbal";
var str2 = "fu\u00DFbal";
Console.WriteLine("{0} : {1}", str1, str2);//print it
Console.WriteLine( String.Compare(str1, str2, StringComparison.Ordinal) == 0 );//false
Thread.CurrentThread.CurrentCulture = new CultureInfo("de-DE"); //change culture
Console.WriteLine(String.Compare(str1, str2, StringComparison.CurrentCulture ) == 0);//true

 

As you can see, the comparison does return true after I’ve changed the CultureInfo associated with the current thread’s CurrentCulture property. The experienced reader knows that the previous code can be simplified because the Compare method will always perform a character expansion before comparing the strings (in our example, that means that ß will be replaced by ss). So, you don’t really need to change the CurrentCulture or pass the StringComparison.CurrentCulture to the Compare method. Anyways, doing that makes the intent of the code clear and you should always strive to do that.

In one of the previous paragraphs, I’ve mentioned the CompareInfo class: this class is used internally for performing the comparison between the strings. If you need more control, then you’ll be happy to know that nothing prevents you from using that class directly:

var str1 = "fussbal";
var str2 = "fu\u00DFbal";
var culture = new CultureInfo("de-DE");
Console.WriteLine(culture.CompareInfo.Compare(str1, str2) == 0);//true

 

You probably won’t be doing this often, but now you know that it exists Smile. Notice also that there are several overloads of this method which allow you to specify an offset, a length or a CompareOptions value for influencing the returned result. Since we’re talking about CompareInfo, you should also notice that it offers several interesting methods: IndexOf, lastIndexOf, StartsWith, etc.. These methods give you more control than you get by default when using the similar methods of the class String.

And I guess this sums it up: linguistic string comparisons rely on CultureInfo objects which end up delegating that work to the CompareInfo class. There’s still more to say about strings, so stay tuned for more!

Apr 25

So, you know everything about strings, right?–part IV

Posted in .NET, Basics, C#       Comments Off on So, you know everything about strings, right?–part IV

String comparison is one of the most common operations you’ll need to perform in your daily tasks. Besides checking for equality, you’ll also end up comparing strings whenever you need to sort them. The String type offers several methods for performing comparison operations:

  • Equals: this instance method checks  two strings for equality, according with the StringComparison enum value that it receives. There’s also a static version of this method which lets you use the operation without having to check for null.
  • Compare: the type introduces several static overloads of this method which can be used to sort two strings.
  • StartsWith: String introduces two overloads of this instance method. You can use this method to check if the current string starts with another that it receives through a parameter.
  • EndsWith: once again, there are two overloads of this method. You can use them to see if the current string  ends up with another passed in through a parameter.

Most of these methods can receive a StringComparison value through a parameter. This value lets you influence the comparison operation by specifying if the comparison should be case sensitive or insensitive and it the current culture should be used in the current operation.

Some of the overloads of the previous method can also receive a value  from the CompareOptions enum. Since this is a flag enumeration, you can combine several of its values. Most of the comparison operations can be influenced through a CultureInfo object. That’s why there are several overloads which let you specify the culture that should be used on the comparison operation.

In practice, you can divide  comparison operations into two big groups: programmatic string comparisons and linguistic string comparisons. Programmatic string comparisons include all string comparisons related with…you guesses it: programming! For instance, whenever you compare paths, registry keys, URLs, etc., you’re running a programmatic string comparison. The fastest way (and probably the best) of doing this is to run an ordinal case insensitive comparison. Whenever you do this, you’re running a fast comparison because the culture info isn’t taken into account by the method. In practice, this means passing the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase value to the method. You shouldn’t use the InvariantCulture and InvariantCultureIgnoreCase values because they are more costly (from a time/performance point of view) than using the previous values.

There are, however, several times when you do need to compare Strings entered by the user or which are shown to him. In these cases, we’re talking about linguistic string comparisons and they should always take the current culture into account. If you’re sorting strings, you should also perform these operations in a case sensitive way to ensure that two strings which only differ by case aren’t considered equal (if they are, then sorting them might return different orderings in several sort operations and that might really confuse the user). You perform a linguistic string comparison by passing the value StringComparison.CurrentCulture or StringComparison.CurrentCultureIgnoreCase value.

As I’ve mentioned, linguistic string operations depend on the culture. And that’s why we need to take a little detour into cultures before going on. Since this is becoming a rather large post, I’ll leave the details about cultures for a future post. Stay tuned for more!

Apr 24

In the latest post, we’ve seen that strings are immutable. At the time, I’ve mentioned that this brings several advantages, but there are also a couple of gotchas. For instance,  concatenating strings can be an expensive operation, especially when you have lots of strings. To solve this kind of problem, we need to resort to the StringBuilder type. The idea is simple: you create a new instance of the type, add several strings through one of its methods and then, retrieve the final string through its ToString method (which is inherited from Object). Lets start from the beginning…

When you  create a new instance of the StringBuilder, you can use one of the several constructors which let you:

  • specify the maximum number of chars that can be kept by the StringBuilder instance;
  • indicate the default size of the array of chars used by the StringBuilder instance. Notice that that  array might grow when you add strings or chars and its available space  isn’t enough (in that case, the instance will double the current array’s size).
  • pass a string which is used to initialize the internal array of chars held by the type.

You can mix several of those items during construction because the type offers several overloads which let you specify those values (for instance, you can specify the maximum number of chars and the current capacity of the internal array through the StringBuilder( Int32 capacity, Int32 maxCapacity) constructor). The next snippet presents the simplest code you’ll need to instantiate a StringBuilder instance (which, btw, shows its most common use):

var str = new StringBuilder();
Console.WriteLine(str.Capacity);//16
Console.WriteLine(str.MaxCapacity);//Int32.MaxValue

As you can see, the default constructor starts with a 16 chars array and limits the maximum size of that internal array to Int32.MaxValue. After creating an instance, there are a couple of properties/methods which let you change the internal StringBuilder’s array:

var str = new StringBuilder();
Console.WriteLine(str.Length);//number of chars in the array: 4
str.Append("Hello");
Console.WriteLine(str.Length);//5
Console.WriteLine(str[0]); //get char at position 0

You can check the number of chars in the array through the Length property. You can also get or set a single char through the indexed Chars property. The Append method is probably the one you’ll use most often in the day-to-day operations. As you’ve probably inferred from the previous snippet, you can use it to append an object to the internal array (as you’re probably expecting, there are several overloads of this method). Besides Append, you can also use Insert, AppendFormat and AppendLine to add more chars to the internal array.

You’re probably expecting to be able to remove chars from the internal array. If that is the case, you’re correct: you can remove chars by calling the Clear (clears the internal buffer used by the StringBuilder instance) and Remove (removes a range of chars from the array) methods. Finally, there’s also the Replace method which is responsible for replacing all instances of a char with another char or all instances of a string with another string (yes, once again, there are several overloads of this method) in the internal buffer.

One interesting thing regarding these methods is that most of them (if not all) return a reference to itself. In practice, this means that you can chain several method calls:

str.Append("Hello")
    .Replace("o", "0!");

After concatenating everything and performing all the changes you need, you can get a string by calling its toString method:

var finalStr = str.ToString();

By default, most types inherit the Object’s ToString method which simply return the full name of the current object’s type. The StringBuilder type overrides the ToString method because, in this case, it makes more sense to return the encapsulated array char than the name of the type. Before ending, there’s a small gotcha which makes some operations more painful than they should be: there isn’t a complete parity between the methods exposed by String and StringBuilder. For instance, there’s no PadLeft method.

It’s really a pity that the StringBuilder doesn’t expose all the methods defined by String because that means 1.)  doing extra work or 2.) having to go to the String, perform the desired operation and back to StringBuilder instance for continuing with  the string manipulation work. And I guess this is it for now. Stay tuned for more!

Mar 30

So, you know everything about text, right?– part II

Posted in Basics, C#, CLR       Comments Off on So, you know everything about text, right?– part II

In the previous post, we’ve started looking at how to work with text in C# and we’ve run a rather superficial analysis over the Char type. In this post, we’ll start looking at the String type which is probably what you’ll be using most of the time when you need to work with text.

What is a string? In .NET, a string can be seen as a an immutable sequence of characters. Programmatically,  it’s represented through the String type which is sealed and extends the Object type directly (in other words, it’s a reference type). Interestingly, Strings are also considered a primitive type in C# and this means that you can create new Strings through literals:

var aString = "Hello, there";

This is the preferred way to instantiate a new String. The type offers several constructors which let you create a new String from an unmanaged array of chars (char*) or from an unmanaged  array of 8-bit signed integers (aka, SByte). And no, there’s no constructor that receives a string as an argument, though there’s one which creates a new String from an array of Char.

Notice that using the preferred way of creating new strings (ie, through a literal) doesn’t really result in creating a “new“ instance through the newobj IL call. In these cases, the compiler embeds the string in the metadata so that it can load it at runtime.

Strings enjoy special treatment in several languages. For instance, it’s possible to concatenate strings at compile time or at runtime. Here’s an example where the C# compiler is smart enough to concatenate two Strings at compile time:

var aString = "Hello," +" there";

If you’ve had the luck to write some code in C or C++, then you’ll be right at home with the string escape sequences supported in C#:

var aString = "Hello,\tthere";

In the previous snippet, we’ve resorted to \t to introduce a tab in a string. In case you’re wondering, you can escape the \ char used in escape sequences by doubling it:

var path = "C:\\folder";

If you have lots of \ to escape, then you should be using verbatim strings:

var path = @"C:\folder";

Both snippets produce exactly the same results: you end up with a c:\folder string.

Before ending this initial post about strings, there’s one small detail I’ve mentioned at the beginning and which is *really* important. It’s probably the most important thing you should know about strings and I wouldn’t really feel well without writingabout it: strings are *immutable*. Once you create a string, there’s no way to modify it. No, you can’t change a char from it without building a new String instance. No, you can’t make it shorter or longer either!

This might be a surprise, but it does bring a couple of advantages too. For instance, since they’re immutable, you don’t have to worry about synchronization in multithreaded code (IMO, this is a big big thing!). So, you need to do a lot char manipulations? Probably need to concatenate lots of strings at runtime? If that is your case, then you should be using StringBuilder (we’ll be back to this in a future post).

And this is it for now. Stay tuned for more!

Mar 28

So, you know everything about text, right?– part I

Posted in Basics, C#, CLR       Comments Off on So, you know everything about text, right?– part I

In .NET, characters are always represented by 16 bits Unicode values. Programmatically, they’re represented through instances of the System.Char type. Here’s an example of a char represented in C#:

var myChar = 'a';

The Char type offers several helper static methods which do several useful operations. For instance, you can call the IsLetter method to check if a char is a letter:

var isLetter = Char.IsLetter('a');

Besides the IsLetter method, you can also use the IsDigit, IsWhiteSpace, IsUpper, IsLower, IsPuctuation, IsLetterOrDigit, IsControl, IsNumber, IsSeparator, IsSurrogate, IsLowSurrogate, IsHighSurrogate or IsSymbol methods. All these methods have overloads which receive a string and a position that points to the char that needs to be checked. You can get the same results by calling the GetUnicodeCategory method, which returns a value of the UnicodeCategory enumeration. According to the docs, you shouldn’t use this method but the CharUnicodeInfo.GetUnicodeCategory. A small detour: surrogates are an interesting (though out of the scope of this post). Basically, they’re needed to allow the representation of the supplementary characters outside the BMP (Basic Multilingual Plane) with UTF-16. If you’re interested in learning more, I’d recommend getting started here.

Besides checking the “kind” of a character, you can also convert it its lower or upper equivalent in a culture-agnostic way throw the ToLowerInvariant and ToUpperInvariant methods (if you want to take the culture into consideration, then you should call the ToLower and ToUpper methods).

The char type does offer some instance methods too (btw, the previous ones are all static), You can compare two chars through the Equals method or through the CompareTo method (because Char implements the IComparable<char> interface). Besides comparing, you can also find a couple of methods which let you transform a char (or chars) into strings or get chars from an existing string. There’s still another couple of methods which are capable of building chars out of one or more integers (which, as you might have guessed, represent code values). Finally, the GetNumericValue returns the numeric equivalent of the current char.

Before ending this post, there’s still time to refer that the CLR allows you to convert from char into a numeric type (and vice-versa). Explicit casts are the easiest way to do that. You can also perform this operation through one of the methods of the Convert type or by using the IConvertible interface (that is implemented by Char). And I guess this sums it up quite nicely. Stay tuned for more about text.

Mar 26

Last considerations about generics

Posted in Basics, C#, CLR       Comments Off on Last considerations about generics

As we’ve seen, constraints will help in writing “type safe generic code” (not sure if this designation really exists. if it doesn’t, then I’ve invented Smile). But even with generics, there are still a couple of scenarios which might caught you by surprise. Lets start with casts and with a simple example:

class Generic<T> {
    public void DoSomethin(T something) {
        var aux = (String) something;
    }
}

The previous class won’t compile because there’s no way for the compiler to know that it’s possible to convert something into a String. At least, not when it’s compiling the generic open class! Notice that you can’t even apply a constraint to solve this problem. There is, however, a workaround (which isn’t really type safe):

class Generic<T> {
    public void DoSomethin(T something) {
        var aux = (String)(Object) something;
    }
}

Since any object can be cast to Object, then the compiler won’t complain about that first cast. The other cast (String) is now possible, though it might throw an exception at runtime. In fact, there’s a better option of doing the previous cast:

class Generic<T> {
    public void DoSomethin(T something) {
        var aux = something as String;
    }
}

Notice, however, that you’ll only be able to do this whenever you’re converting something into a reference type.

Since generic type arguments can be replaced by any concrete type, then you might be wondering how can you initialize a “generic” variable to its default value. The problem here is specifying its default value. I mean, if you’re talking about reference types, then it’s ok to set it to null. However, doing it with a value type results in a compile type error. To solve this problem, Microsoft introduced the default keyword:

class Generic<T> {
    public void DoSomethin(T something) {
        T aux = default(T);
    }
}

Now, the compiler is happy. When T is replaced by a reference type, aux is initialized with null. On the other hand, if T is replaced by a value type, then aux reference some memory space (big enough to save a value of that type) with all its bits initialized to 0. Btw, there’s also an interesting gotcha regarding the use of null: you can’t use it to initialize a variable, but you can use it with the operators == and !=. Here’s and example:

class Generic<T> {
    public void DoSomethin(T something) {
        if(something == null ) {
            //do something
        }
        else {
            //do something else
        }
    }
}

When T is replaced by a value type, the JIT compiler won’t emit the native code for the if part because a value type never is null. Notice that if we had constrained T to a struct, then yes, we’d be getting a compile error on the if clause.

It’s also important to understand that comparing two variables of the same generic type results in an error if that generic type isn’t constrained to a reference type:

class Generic<T> {
    public void DoSomethin(T something, T somethingElse) {
        if( something == somethingElse) {
            //some code
        }
    }
}

If T were constrained to a reference type, the code would have compiled without any errors. However, constraining T to a a value type, will always result in an error. Since you cannot constrain T to a specific value type (a value type is always sealed and we’ve seen that we can’t do that because it’s more efficient to write code that uses directly that type), you’re limited to saying that it’s a value type. And this information isn’t enough to make the compiler do its work and emit the correct code for the comparison.

A final note: you can’t use generic type variables as operands (ie, you can’t use them with the operators +, –, *, etc.), making it impossible to write expressions which work for any numeric type. This is a tremendous problem, especially for those guys that work in the financial world…

And that’s it for now! Stay tuned for more.

Mar 23

Generic and constructor constraints

Posted in Basics, C#, CLR       Comments Off on Generic and constructor constraints

In this post, we’ll talk about one last kind of constraints: the constructor constraint. Whenever you apply a type constraint to a generic type argument, the compiler ensures that it can only be replaced by a concrete type which does expose a public default constructor. Constructor constrains are specified through the new keyword, as you can see from the following snippet:

public class AnotherGenericConstraint<T> where T:new() {
    public T GetNew() {
        return new T();
    }
}

Without constructor constraints, there really wouldn’t be a way for the compiler to allow you to instantiate generic type to which a primary constraint struct *hasn’t* been applied. Btw, and since we talk about value types, it’s an error to specify both a struct and type constraint to a generic type argument:

public class AnotherGenericConstraint<T>
    where T:struct, new() { //compile error
    public T GetNew() {
        return new T();
    }
}

A final note: currently, there’s not way to specify the number of parameters a constructor may receive. And it seems like that won’t change in the future, meaning that we’re struck with parameterless constructors. And I guess this is all about generic constraints. Stay tuned for more.

Mar 23

Generics and secondary constraints

Posted in Basics, C#, CLR       Comments Off on Generics and secondary constraints

Besides primary constraints, you can also constrain a generic type parameter through secondary constraints. Unlike primary constraints, you can apply zero or more secondary constraints to a generic type argument. Interfaces constraints are the most common kind of secondary constraints you’ll apply in your code. They allow the compiler to ensure that a generic type argument can only be replaced by a type which implements the interface specified by the constraint. If you want, you can specify several interface constraints. In these cases, you can only replace the generic type argument by a concrete type which implements all those interfaces. The next snippet illustrates the use of this kind of constraint:

public interface IDoSomething{}
public interface IDoSomethingElse{}
public class DoEverything:IDoSomething,IDoSomethingElse {
}
public class DoOneThing:IDoSomething {
}
public class MyGeneric<T> where T:IDoSomething, IDoSomethingElse {
        
}

And here’s some code that instantiates the class:

var aux1 = new MyGeneric<DoEverything>();//ok!
var aux2 = new MyGeneric<DoOneThing>();//oops

Besides interfaces, there’s another kind of secondary constraint, which is known as naked type constraint. This secondary constraint allows you to specify the existence of a relationship between two generic type arguments. Here’s a quick example:

public class AnotherGeneric<T, TBase> where T:TBase {
        
}

And here’s some code that shows how you can (and can’t instantiate it):

public class A{}
public class B:A{}
var ok = new AnotherGeneric<B, A>();
var oopss = new AnotherGeneric<A, B>();

ok compiles without an error, but that doesn’t happen with oopss because the compiler knows it can’t convert from A to B (though it knows that it can convert from B to A) .

And I guess that’s it for now. Stay tuned for more.

Mar 21

Generics and primary constraints

Posted in Basics, C#, CLR       Comments Off on Generics and primary constraints

In the previous post, we’ve started looking at constraints. Primary constraints are one type of constraints. Currently, you can specify zero or one primary constraint when you introduce a generic type argument. Whenever you restrain a generic type argument to a non-sealed class, you’re using a type constraint. Here’s an example:

public class Test {
}
public class TestingConstraints<T> where T:Test {
}

When the compiler finds a primary constraint of this type, it knows that T can only be replaced by Test or from another type derived from Test. In the previous example, this means that all public members introduced by Test can be used from within the members of TestingConstraints. Btw, you can’t use a sealed type as a primary constraint. And it’s kind of logical, if you think about it. Lets suppose S is sealed. If S is sealed, then you can’t extend it (ie, you cannot create new derived types from it). That being the case, and if primary constraints could reference sealed types, then you’d be building a generic type whose type argument could only be replaced by *one* specific type. Well, we don’t really need generics to do that, right? I mean, the final result would be the same as creating a non-generic type which would reference directly the sealed type S! In other words, the generic type would end up being non-generic and that just doesn’t make sense!

Primary constraints are mandatory when introducing a generic argument. If you don’t specify one, then Object is used by default. Notice that the following types cannot be used as a primary constraint: Object, Array, Delegate, MulticastDelegate, ValuType, Enum and Void.

Besides referencing a non-sealed type, there are also two other special primary constraints: class and struct. When the compiler finds the class constraint, it knows that the generic type argument will only be replaced by a reference type. In practice, this means that the  generic type argument can be replaced by any class, interface, delegate or array type. On the other hand, when the compiler finds the struct constraint, it understands that the generic type argument will be replaced by a value type. In this case, the generic type argument can be replaced by any value or enum type.

There are some important assumptions the compiler can make from these special primary constraints. Here’s a small example:

public class GenericValueType<T> where T:struct {
    public T Create() {
        return new T( );
    }
}
public class GenericReferenceType<T> where T: class {
    public T DoSomething() {
        T aux = null;
        //some code
        return aux;
    }
}

The previous code compiles without an error. Whenever we constrain a generic type argument to a struct, then the compiler knows that it newing up T is ok because all value types have a default public constructor (and that might not happen with reference types). If you concentrate on GenericReferenceType, then you’ll notice that aux is initialized with the null value. You can only attribute null to reference type variables. Since you’ve constrained T to class, then the compiler knows that this assignment is doable and won’t complain about it (try removing the constraint and see what happens – there is a way to go around it, but I’ll leave it for a future post).

Notice that there’s a small gotcha regarding primary constraints associated with the struct keyword: you cannot replace the generic type argument with the Nullable<T> type. Nullable<T> is a value type (ie, struct), but it gets special treatment from the compiler and the CLR. It’s because of that you cannot replace T with Nullable<T> on GenericValueType<T>. Nullable<T> are an interesting topic and we’ll return to them in future posts.

And I guess this sums it up for primary constraints. In the next post, we’ll go over secondary constraints. Stay tuned for more.

Mar 21

Generics: getting started with constraints

Posted in Basics, C#, CLR       Comments Off on Generics: getting started with constraints

Now that we have basic knowledge about generics, we can proceed and talk about constraints. When we write “generic” code, the compiler must ensure that that code will work for any type which might be used in place of the generic type argument. For instance, take a look at the following code:

public class Test {
    void PrintInfo<T>(T genericObject) {
        Console.WriteLine(genericObject.ToString(  ));
    }
}

How can the compiler guarantee that the previous code will work for any type? (btw, lets ignore the fact that genericObject might be null). In the previous snippet, it’s really easy because we know that Object is the base class of all objects and ToString is one of the methods which is introduced by it. And what if we had the following code:

public class Test {
    void PrintInfo<T>(T first, T second)  {
        //compare both
        var comparison = first.CompareTo( second );
        if( comparison > 0 ) {
            Console.WriteLine("second smaller than first");
        }
        else if( comparison < 0 ){
            Console.WriteLine("first smaller than second");
        }
        else {
            Console.WriteLine("Equals");
        }
    }
}

The idea behind the previous snippet is simple: we’re expecting that T implements the IComparable<T> interface. If that happens, then we can call the CompareTo method over the first instance in order to compare them. As you’ve no doubt have noticed from the previous snippet, the compiler isn’t really happy with the CompareTo call. After all, implementing the IComparable<T> interface is really optional and the compiler can’t really generate IL code from the previous method that works for any type.

And that’s why we have constraints. With constraints, we can limit the number of types that can be specified for a generic type argument. By adding a constraint, we can make sure that the generic type argument will only be replaced by types which fall under certain conditions. So, lets fix the previous snippet so that it compiles:

public class Test {
    void PrintInfo<T>(T first, T second) where T:IComparable<T> {
        //compare both
        var comparison = first.CompareTo( second );
        if( comparison > 0 ) {
            Console.WriteLine("second smaller than first");
        }
        else if( comparison < 0 ){
            Console.WriteLine("first smaller than second");
        }
        else {
            Console.WriteLine("Equals");
        }
    }
}

A constraint is always specified through the where keyword and, in the previous example, it tells the compiler that T can only be replaced by a type which implements the IComparable<T> interface (trying to pass a type which doesn’t implement that interface results in a compiler error). With this small change, we can treat T like an object which implements IComparable<T>. Notice that constraints can be applied where you’re allowed to introduce generic types (ie, you can add constraints at class, interface, delegate or method level).

There are several types of constraints, which are grouped into several categories. In the next post, we’ll take a look primary constraints. Stay tuned for more.

Mar 20

If you’ve been reading this blog, you probably know that my latest book on HTML5 has been released. In this last book project, I’ve run a little experiment: I’ve asked for the help of my readers for reviewing the manuscript. There were several guys which answered the call and I liked the way things worked out. Since the experiment worked, I’ve decided to run it again for my next book project. This time, I’m writing about JavaScript (one of my favorite languages!) and the idea is to build a small book (written in Portuguese) which helps people understand the specificities associated with its use. This is not an HTML JS book. This is really about Javascript and you can use it to improve your knowledge about the language.

Even though this is a small book (under less than 200 pages), it’s also true that the time for reviewing the manuscript is limited to 2 weeks. Once again, the only thing I can give you for your help is a free copy and public acknowledgment for your work. If you’re interested, then please drop me a line at labreu at gmail.com.

Mar 19

Generic interfaces and delegates

Posted in Basics, C#, CLR       Comments Off on Generic interfaces and delegates

In the previous post, I’ve introduced the concept of generics. In this post, we’ll keep looking at generics and we’ll talk a little bit about how we can create generic interfaces and delegates. Without generic interfaces, manipulating a value type through its interface reference would result in boxing (and less compile type safety). The some goes for delegates: with generics, it’s possible to create a delegate which allows a value type instannce to be passed back to a callback method in a type-safe way without any boxing.

There’s not much to say about the syntax used for creating generic interfaces and delegates. So, here’s an of a generic interface:

public interface IDoSomething<T> {
    void DoItNow(T);
    T Whatgetdate();
}

And here’s an example of a generic delegate (btw, it’s defined by the framework):

public delegate void Action<T>(T item);

Tipically, the compiler will transform a delegate into a class which looks like this:

public class Action<T> : MulticastDelegate {
    public Action(Object obj, IntPtr method);
    public virtual void Invoke(T item);
    public virtual IAsyncResult BeginInvoke(T item, AsyncCallback cb, Object obj);
    public virtual void EndInvoke(IAsyncResult result);
}

(And we’ll see why in future posts Smile)

From .NET 4.0 onward, the type arguments of a generic delegate or interface can be classified as contravariant or covariant. In previous versions, generic arguments were always invariant, ie, the generic type parameter couldn’t be changed after being defined. In C#, contravariant type arguments are indicated through the in keyword. It means that the generic type parameter can change from a class to another derived from it. On the other hand, covariant type arguments are indicated in C# through the out keyword and, in this  case, a generic type argument can change from a class to one of its  base class. An example might help you understand what’s goin here. Suppose we start with this (again, defined by the framework):

public delegate TResult Func<in T, out TResult>(T arg);

In this case, T is contravariant and TResult is covariant. So, if you have the following classes:

class Base {}
class Derived : Base {}

Now,  if you still haven’t forgotten that Object is the base class of all objects, then from .NET 4.0 onward, you can simply do this:

Func<Object, Derived> fun1 = null;//not important here
Func<Base, Base> fun2 = fun1;

Notice that there’s no cast there! this is not the easiest example to understand what’s going on, but I really don’t have time for more (sorry). What matters is understanding that fun1 references a method which expects and object and returns na instance of Derived. On the other hand, fun2 expects a method that receives a Base and returns another Base instance. Since you can  pass a Base to a method which expects an Object and since you can treat a Derived instance as  if it was a Base instance (because Derived extends Base), then the previous code is safe.

Interestingly, the compiler will only be happy with contravariance and covariance if it can infer that there is a conversion between the tpyes being used. That means that variance won’t work for value types because it would require boxing. In practice, this means that this won’t compile:

void DoSomething(IEnumerable<Object> items) { }
DoSomething(new List<Int32>());//ooops:  compiler error

It’s a pity, but that’s just the way it is. Before ending, there’s still time to say that you should always specify the variance of a type argument when creating generic arguments. This doesn’t introduce any nasty side effects and enables the use of the delegate in more scenarios. If you want to know more about variance, then I recommend reading Eric Lippert’s excellent series on the topic. And that’s it for now. Stay tuned for more.

Mar 14

Generics: open vs closed types

Posted in Basics, C#, CLR       Comments Off on Generics: open vs closed types

Before going on, does anyone know how to improve the performance of Windows Live Writer? It’s painfully slow after I’ve updated it to the latest release and it’s upsetting me a lot. I’ve tried to disable everything, but the damn app doesn’t show any improvements…damn…ok, enough bitching…

In the previous post, I’ve introduced generics. In this post, we’ll keep looking at generics and I’ll present the concept of open vs. closed type. Before going on, let’s recover the MyList snippet I’ve introduced in the previous post:

class MyList<T> {
    public void Add(T item) {
    }
    public void Remove(T item) {
    }
    public void Sort(IComparer<T> comparer) { }
}

And then there was some code which used MyList with ints:

var intList = new MyList<Int32>();
intList.Add(10);
intList.Remove(20);

And now, we’re ready to proceed Smile

As you probably know,the CLR uses internal data structures for every type used by an application. These data types are generally known as type objects. Generics are also types and that’s why it shouldn’t surprise you to find out that they too have their own type objects. In other worlds,MyList<T>  is a type and you can get its type object. However, there’s an important difference between MyList<T> and, say, String: you cannot instantiate a generic type without specifying a data type for its type parameter. That’s why we had to pass Int32  when we instantiated MyList in the previous snippet.

Btw, a type with generic type parameters is called an open type. Whenever code  references a generic type and replaces its generic type argument with a specific type, we end up with what is known as a closed type (notice that closed types are only obtained when all generic type arguments are replaced with concrete types; if that doesn’t happen, then you’re just creating a new open type). An interesting way to see the difference between open and closed types is to print the information returned by the typeof operator to the console:

Console.WriteLine(typeof(MyList<Int32>));
Console.WriteLine(typeof(MyList<>));

The previous snippet ends up printing the following:

ConsoleApplication1.MyList1[System.Int32]
ConsoleApplication1.MyList1[T]

The astute reader will probably notice the backtick followed by a number…the number represents the type’s arity, which is used for indicating the number of type parameters of that type. In our example, MyList arity 1 because it only has one type argument. Another interesting conclusion is that the closed type will share eventual static fields with all instances of that closed type (in our case, it MyList<T> introduced a static field, then MyList<Int32> is a closed type with its own static fields which aren’t shared with, say, MyList<String>).

Being a type, it’s also possible to use a generic type (open or closed) as a base type or to build a new generic type which extends an existing type. Before ending, time for one last gotcha: I’ve noticed that some developers prefer to do this:

class IntList : MyList<Int32> { }

instead of instantiating MyList<Int32> directly like I did in the previous snippets. Please, don’t do this. The problem with this approach is that IntList is a different type from MyList<Int32>:

Console.WriteLine(typeof(IntList) == typeof(MyList<Int32>));

The previous snippet prints false because we’re talking about types with different identities! If you prefer the last approach because it saves a few keystrokes, then don’t forget that you can use the var approach or even create an alias through the using keyword. And that’s it for now. Stay tuned for more.