Apr 26

So, you know everything about text, right?–part VI

Posted in .NET Basics C#      Comments Off on So, you know everything about text, right?–part VI

In this post, we’ll talk a little bit about string interning. As I’ve pointed out before, strings are immutable. So, it’s fair to say that the following snippet ends up wasting memory because we should end up with two different instances that, for all practical purposes, represent the same string:

var str1 = "hi";
var str2 = "hi";

From a memory usage point of view, wouldn’t it be great if we could make both variables reference the same String CLR object? The answer to this lies in string interning. When the CLR is initialized, it will automatically create a private hash table, whose entry keys are strings and which holds String objects created and maintained in the managed heap. This technique might lead to some performance improvements when you know *for sure* that your app is supposed to work with lots of “equivalent” strings.

Currently, the String class introduces two static methods related with string interning:

var internedString = String.Intern("hi");
var str = String.IsInterned("hi");//checks if "hi" is interned

Both methods receive a string. The first (Intern) checks the private CLR table for a match. If an identical string already exists, it returns a reference to that string. When that doesn’t happen, it performs a copy of the passed string, adds it to the private table and returns that instance. The IsInterned method might not work as you’re expecting…like the Intern method, it will also take a string which is used to perform a look up in the private CLR’s hash table. If there’s a matching entry, it will return a reference to that string. If that isn’t the case, then it will simply return null.

If you’re an experienced developer, you’re probably seeing a big problem with string interning: there’s no way to free the memory used by the strings maintained in the private hash table…well, to be honest, there is one (but that will probably be too drastic for your app): you need to unload the default AppDomain and that will only happen when you kill your app’s process. see? I told you it would be a bit drastic for most apps Smile

By now, you’re probably wondering if  the CLR performs string interning by default. And the answer is yes, but *only* for all literal strings defined in the assembly metadata. In theory, you should be able to control this behavior. Even though .NET supports the CompilationRelaxationsAttribute since version 2.0, the truth is that the CLR v4.0 will ignore the use of that attribute. You can test this by building the following code with the [CompilationRelaxations(CompilationRelaxations.NoStringInterning)] applied to it (hint: you don’t  really need to add it because the C# compiler automatically adds the attribute for you):

var str1 = "hi";
var str2 = "hi";
var sameString = Object.ReferenceEquals(str1, str2);

To be honest, string interning seems really great in theory, but I’m still not sure about its use in the real world. Even though I’m not the most experienced developer in the world, the truth is that I never had to  use it explicitly in my apps. MS seems to think that it might impact your code in a bad way or it wouldn’t have introduced the CompilationRelaxationsAttribute for allowing you to control it (even though it seems like the CLR won’t respect the use of that attribute in most of the scenarios). I’d say that if you do need to work with lots of strings, then you should consider it…but don’t forget to measure to see if it’s really improving your app. And that’s it for now. Stay tuned for more!