Mar 28

So, you know everything about text, right?– part I

Posted in Basics C# CLR      Comments Off on So, you know everything about text, right?– part I

In .NET, characters are always represented by 16 bits Unicode values. Programmatically, they’re represented through instances of the System.Char type. Here’s an example of a char represented in C#:

var myChar = 'a';

The Char type offers several helper static methods which do several useful operations. For instance, you can call the IsLetter method to check if a char is a letter:

var isLetter = Char.IsLetter('a');

Besides the IsLetter method, you can also use the IsDigit, IsWhiteSpace, IsUpper, IsLower, IsPuctuation, IsLetterOrDigit, IsControl, IsNumber, IsSeparator, IsSurrogate, IsLowSurrogate, IsHighSurrogate or IsSymbol methods. All these methods have overloads which receive a string and a position that points to the char that needs to be checked. You can get the same results by calling the GetUnicodeCategory method, which returns a value of the UnicodeCategory enumeration. According to the docs, you shouldn’t use this method but the CharUnicodeInfo.GetUnicodeCategory. A small detour: surrogates are an interesting (though out of the scope of this post). Basically, they’re needed to allow the representation of the supplementary characters outside the BMP (Basic Multilingual Plane) with UTF-16. If you’re interested in learning more, I’d recommend getting started here.

Besides checking the “kind” of a character, you can also convert it its lower or upper equivalent in a culture-agnostic way throw the ToLowerInvariant and ToUpperInvariant methods (if you want to take the culture into consideration, then you should call the ToLower and ToUpper methods).

The char type does offer some instance methods too (btw, the previous ones are all static), You can compare two chars through the Equals method or through the CompareTo method (because Char implements the IComparable<char> interface). Besides comparing, you can also find a couple of methods which let you transform a char (or chars) into strings or get chars from an existing string. There’s still another couple of methods which are capable of building chars out of one or more integers (which, as you might have guessed, represent code values). Finally, the GetNumericValue returns the numeric equivalent of the current char.

Before ending this post, there’s still time to refer that the CLR allows you to convert from char into a numeric type (and vice-versa). Explicit casts are the easiest way to do that. You can also perform this operation through one of the methods of the Convert type or by using the IConvertible interface (that is implemented by Char). And I guess this sums it up quite nicely. Stay tuned for more about text.