Category Archives: Uncategorized

Want to Help Design C# 6.0 This Weekend?

The team is asking for your feedback. If you have a couple of minutes, answer these questions or respond to this thread.

Declaration Expressions

C# 6 is introducing a new feature called declaration expressions. It’s cool because it lets you write this code:

if (int.TryParse(s, out int i)) { … i … }

No more separate variable appearing above the conditional. Less ceremony, less ugliness, yippee!!!!!

The initial implementation expanded the scope of the variable to the next larger scope. This allowed you to do stuff like:

GetCoordinates(out var x, out var y);
… // use x and y;

While that might look useful, I hate it because it obscures the declaration of variables that extend across the entire scope – such as the method. This is called spill out and the team plans to remove it.

For clarity, in the condition if statement above, the integer i is in scope within the statement or block that is effectively the “then” clause of the conditional expression. Most of the enclosing blocks you use: do, for, using have a single place to declare a variable.

Yippeee!

The Issue to Resolve…


The question the team is asking for help resolving involves code like this:


if (int.TryParse(s, out int i)) Console.WriteLine("Got integer {0}", i);
else Console.WriteLine("Got no integer {0}", i);
// Do you expect "i" to be in scope in the "else" clause?


 

The way C# itself is designed (in the .NET Compiler Platform, Roslyn trees) is an else clause is contained inside theif statement. The above use of the variable i is legal (Q1 below).

The scenario most impacted by this decision is a series of else if statements (Q2 below):


if (int.TryParse(s, out var v)) Console.WriteLine("Got integer {0}", v);
else if (double.TryParse(s, out var v)) Console.WriteLine("Got double {0}", v);
else Console.WriteLine("Ain't got nuffink");
// Do you expect you can re-use the name "v" in both clauses?
if ((var v = o as string) != null) Console.WriteLine(v.Length);
else if ((var v = o as Exception) != null) Console.WriteLine(v.Message);
// Do you expect you can re-use the name "v" in both clauses?


 

And the way you manually refactor might be effected because if (b) S1 else S2 will no longer mean precisely what if (!b) S2 else S1 means – you might have to do some variable switcharoos (Q3):


if (int.TryParse(s, out int i)) Console.WriteLine("Got integer {0}", i);
else Console.WriteLine("no integer");
// Do you expect you can negate the condition and switch around "then" and "else" clauses?
if (!int.TryParse(s, out int i)) Console.WriteLine("no integer");
else Console.WriteLine("Got integer {0}", i);


 


The Poll


I’ll hand the results to the team as a non-scientific (because you aren’t randomly chosen) poll, along with any comments you want to make – although if you want to be part of the discussion, this is the place to be. Suggested answers include: yes, no, maybe, don’t care, don’t understand.

Q1: Would you expect a variable declared in the “if” condition to be in scope and available for use in the code of the “else” clause?

Q2: Would you expect to be able to reuse the same variable in multiple “if” clauses that appear as a series of “else if” statements?

Q3: Do you expect to be able to rearrange if and else clauses with no risk of compiler errors regarding out of scope variables?

Q4: Do you think it matters whether C# and Visual Basic.NET do the same thing?

Q5: Are there other scenarios you’re worried or curious about regarding the new declaration expression features?

RoslynDom Quick Start

This document is about an early version of RoslynDom, focusing mostly on working features, with notes on the impact of missing certain upcoming features. You can also see notes on missing features in GitHub issues.

You can find the code for these quickstarts in the RoslynDomExampleTests NuGet package.

For more information see these documents in the “Documents” folder on GitHub (Creation of these documents is currently in progress):

  • See the RoslynDom Project Layout if you are curious about why there are five projects and the dependencies these projects have on the .NET Compiler Framework (Microsoft.CodeAnalysis), CSharp compiler (Microsoft.CodeAnalysis.CSharp) and Unity (Microsoft.Practices.Unity.*)
  • See the RoslynDom Design Overview for a discussion of how RoslynDom is built
  • See the RoslynDom Extensibility if you’re interested in doing more with RoslynDom
  • See the RoslynDom Roadmap.ppt for a vision of RoslynDom

What is RoslynDom

RoslynDom is an alternative view of your code.

The most efficient, best way to express your code in ASCII is your code in your favorite language.

The most efficient, best way to organize your code for your compiler is within the compiler, and exposed as part of the .NET Compiler Platform, Roslyn.

Another, ephemeral expression of your code is the one in your head. This is the one that comes out in words in your meetings, and you have entire meetings without phrases like “angle bracket.”

RoslynDom models this third expression of code in memory which has several features:

  • You can load existing code into the RoslynDom model and easily explore, navigate and analyze it. The RoslynDom model is language agnostic.

This feature is currently affected by not yet having multi-file support

  • RoslynDom is mutable. You can alter your code in a load, alter, alter, alter, build output model. Since you can easily navigate your code, finding the location for change is easy
  • RoslynDom entirely isolates the language dependent load process from the model itself. At a simplistic level, when the VB factory is in place, you can load from C# and output to VB and vice versa.
  • RoslynDom models can be created in any manner you desire. RoslynDom views can be created without loading code, and then brand new code created.

 

The basic model

Code exists in the following hierarchy

  • Root groups which are groups of files (not yet implemented)
  • Roots, which are files or a single conceptual load unit
  • Stem members – Namespaces and types that can contain be contained in roots
  • Type members – nested types, methods, properties, etc. that can be contained in types
  • Statements – code statements that are held primarily in methods and property accessors
  • Expressions – sub parts of statements that return values

Most major features, including most statements are complete, see GitHub issues.

Expressions are currently handled via strings by design.

Walkthrough 1: Load and check code

Step 1: Load your code

Add a using statement for RoslynDom.CSharp.

Retrieve the singleton instance of the RDomCSharpFactory from the RDomCSharpFactory.Factory property and call the GetRootFromFile method to open a specific file:

var factory = RDomCSharp.Factory;
var root = factory.GetRootFromFile(fileName);


NOTE: Other overloads support loading source code from strings or trees.

NOTE: You can iteratively work through the files in your project or solution. This approach will be hampered because specifying references and multiple syntax trees for the underlying model isn’t yet supported.

Of course you can assign the factory property to a local variable or class field if you prefer.

RDomCSharp is the code that creates the language agnostic RoslynDom tree from C# code, and that can recreate C# code from the RoslynDom tree. You can create a RoslynDom tree from scratch as well. You will later be able to load from other languages, in particular VB.NET.


Step 2: Check your code


Output your code to a string to test the output. You can do this by outputting to a new file and comparing the files:


var output = factory.BuildSyntax(root).ToString();
File.WriteAllText(outputFileName, output);


Conclusion


You now know how to load and output code from RoslynDom

Walkthrough 2: Navigate and interrogate code


One of the major user scenarios intended for RoslynDom is to allow you to answer questions about your code. This is just a small sampling of the kinds of things you can do.

At present, RoslynDom supports structural features (classes, methods, etc) and statements. It does not support expressions because user stories with value aren’t yet clear.

Step 1: Load and check code


Load and check your code as shown in Walkthrough 1.

Step 2: Ask general questions about code


LINQ is your friend.

You’ll often find it convenient to make an array for easier sequential requests in testing.


var factory = RDomCSharpFactory.Factory.GetRootFromFile(fileName);
Assert.AreEqual(1, root.Usings.Count());
Assert.AreEqual("System", root.Usings.First().Name);
Assert.AreEqual(1, root.Namespaces.Count());
Assert.AreEqual(1, root.RootClasses.Count());


Assigning intermediate values to variables in tests can help clarity


var methods = root.RootClasses.First().Methods.ToArray();
Assert.AreEqual(0, methods[0].Parameters.Count());
Assert.AreEqual(1, methods[1].Parameters.Count());
Assert.AreEqual("x", methods[1].Parameters.First().Name);


The difference between Classes and RootClasses is that root classes include all classes under the root, regardless of namespace. Classes are only those directly under the root. Similar for Interfaces, Enums and Structures.

Step 3: Place a break point and query code


Place a breakpoint, run the test in debug mode and ask questions in the immediate window about the code. Sometimes you’ll have to use the Watch window because of the .NET Compiler Platform CTP behavior. Have fun!

Step 4: Ask harder questions


That might have been fun, but the real value from RoslynDom comes from asking complex questions. I’ll introduce LINQ in this walkthrough, and then show something I really wanted to accomplish in the next.

Ensure RolsynDom.Common and System.Linq are included in the using statements.

Let’s say you’re concerned about unsigned ints variables in your code and want to examine their names. I don’t know why, I just had to make something up.

You can retrieve the RoslynDom entry with


var uintVars = root
.Descendants.OfType<IVariable>()
.Where(x => x.Type.Name.StartsWith("UInt"))
.Select(x => x.Name);


NOTE: Aliases are language specific, RoslynDom entries are agnostic so use the .NET name of the type. The CSharp factory is responsible for straightening this out on output.


As another example, say you want all the methods and variables where unsigned ints are used:


var uintCode = (from c in root.Descendants.OfType<IStatementContainer>()
from v in cl.Descendants.OfType<IVariable>()
where v.Type.Name.StartsWith("UInt")
select new
{
containerName = cl.Name,
variableName = v.Name
} )
.ToArray();


Walkthrough 3: Finding questionable implicit variable typing


I have a sin when I code. I really like ignoring types. When I write code I use var everywhere. This saves me time. But, I realize it can result in code that’s less readable.

I can accept a rule that implicit variable typing should only be used on object instantiation, strings, Int32 (int), and DateTime in VB. VB isn’t yet supported.

This combination of selecting types based on the implemented interfaces, and examining additional properties, like types and names is very powerful in finding particular locations in code. I want to find all the implicitly typed variables that are not an object instantiation, assignments to literal strings, or assignments to integers?

Since this is a complicated question, I’ll ask in steps, although you can certainly refactor this into a single statement if you prefer. LINQ doesn’t evaluate until requested, so the piecewise creation is not a performance issue:

Find all implicitly typed local variables:


var implicitlyTyped = root
.Descendants.OfType<IDeclarationStatement>()
.Where(x => x.IsImplicitlyTyped);


Find all instantiations, because they’re OK:


var instantiations = implicitlyTyped
.Where(x => x.Initializer.ExpressionType == ExpressionType.ObjectCreation);


Find all string, integer (32 bit) and DateTime literals, because they’re OK:


var literals = implicitlyTyped
.Where(x => x.Initializer.ExpressionType == ExpressionType.Literal &&
( x.Type.Name == "String"
|| x.Type.Name == "Int32"
|| x.Type.Name == "DateTime" )// for VB
);


Find all the implicitly types variables that aren’t instantiations or literals string/ints:


var candidates =implicitlyTyped
.Except(instantiations)
.Except(literals);


Step 6: Reporting


The code discussed here is in the ReportCodeLines method.

Once you get the information you’re interested in, you’ll probably want to output it. Obviously in reporting, you’d like file, line and column positions. RoslynDom is an abstract tree that does not directly understand text. But it maintains, and can report, about the code it was created from by holding references to key aspects of the underlying .NET Compiler Platform (Roslyn) objects. As long as you haven’t changed the tree, these aspects remain correct.

If you change the tree, the only safe way to report positions of RoslynDom elements is to recreate the underlying syntax tree, and then reload that tree into RoslynDom – generally also searching again for the elements of interest.

Because we haven’t changed the tree since loading it, this isn’t a problem.

Create a SyntaxTree from part of the RoslynDom item:


private string GetNewCode(IDom item)
{
var ret = new List<string>();
return RDomCSharp.Factory.BuildSyntax(item).ToString();
}


Retrieve the original code that was used to create the RoslynDom element:


private string GetOldCode(IDom item)
{
var node = item.RawItem as SyntaxNode;
if (node == null)
{ return "<no syntax node>"; }
else
{
return node.ToFullString();
}
}


Retrieve the original code position:


private LinePosition GetPosition(IDom item)
{
var node = item.RawItem as SyntaxNode;
if (node == null)
{ return default(LinePosition); }
else
{
var location = node.GetLocation();
var linePos = location.GetLineSpan().StartLinePosition;
return linePos;
}
}


Retrieve the original code filename:


private string GetFileName(IDom item)
{
var root = item.Ancestors.OfType<IRoot>().FirstOrDefault();
if (root != null)
{ return root.FilePath; }
else
{
var top = item.Ancestors.Last();
var node = top as SyntaxNode;
if (node == null)
{ return "<no file name>"; }
else
{ return node.SyntaxTree.FilePath; }
}
}


You can use these helper methods in LINQ to create an IEnumerable of an anonymous type:


var lineItems = from x in items
select new
{
item = x,
fileName = GetFileName(x),
position = GetPosition(x),
code = GetNewCode(x)
};


I’ll use a string formatting trick to make pretty columnar output. I’ll first determine the length of each segment of the string output – such as the maximum file path length. I’ll replace dummy values in a format string, such as fMax, to create a custom format string for the sizes in this data:


var filePathMax = lineItems.Max(x => x.fileName.Length);
var itemMax = lineItems.Max(
x => x.item.ToString().Trim().Length);
var lineMax = lineItems.Max(
x => x.position.Line.ToString().Trim().Length);
var format = "{0, -fMax}({1,lineMax},{2,3}) {3, -itemMax} {4}"
.Replace("fMax", filePathMax.ToString())
.Replace("itemMax", itemMax.ToString())
.Replace("lineMax", lineMax.ToString());


I can then iterate across the IEnumerable of anonymous type:


foreach (var line in lineItems)
{
sb.AppendFormat(format, line.fileName,
line.position.Line, line.position.Character,
line.item.ToString().Trim(), line.code);
sb.AppendLine();
}
return sb.ToString();



This results in nice output like (which would be nicer if I wasn’t wrapping):

Walkthrough_1_code.cs(13, 16) RoslynDom.RDomDeclarationStatement : ret {String} var ret = lastName;

Walkthrough_1_code.cs(51, 16) RoslynDom.RDomDeclarationStatement : x3 {Int32} var x3 = x2;

Walkthrough 4: Fixing questionable implicit variable typing


What good would it be to find issues if you couldn’t fix them. But I’m tired, so I’m going to mostly let you figure out how this code works based on what you’ve already learned


[TestMethod]
public void Walkthrogh_4_Fix_implicit_variables_of_concern()
{
// Assumes Walkthrough_3 passes
var root = RDomCSharp.Factory.GetRootFromFile(fileName);
var candidates = FindImplicitVariablesOfConcern(root);
foreach (var candidate in candidates)
{
candidate.IsImplicitlyTyped = false; // All you need
}
var output = RDomCSharp.Factory.BuildSyntax( root.RootClasses.First());
// For testing, force changes through secondary mechanism
var initialCode = File.ReadAllText(fileName);
var newCode = initialCode
.Replace("var ret = lastName;", "System.String ret = lastName;")
.Replace("var x3 = x2;", "System.Int32 x3 = x2;")
.SubstringAfter("Walkthrough_1_code\r\n{\r\n")
.SubstringBeforeLast("}")
;
Assert.AreEqual(newCode, output.ToFullString());
}


The only thing that’s required is to state that these declarations should not be implicitly typed by setting IsImpliciltyTyped to false for each candidate. The rest of the code is to create a test.

But this results in the rather ugly System.String declaration. That’s jarring in a file that uses the C# aliases. That fix is in the next test:


[TestMethod]
public void Walkthrogh_4_Fix_non_aliased()
{
// Assumes Walkthrough_3 passes
var root = RDomCSharp.Factory.GetRootFromFile(fileName);
var candidates = FindImplicitVariablesOfConcern(root);
foreach (var candidate in candidates)
{
candidate.IsImplicitlyTyped = false;// All you need
candidate.Type.DisplayAlias = true; // All you need
}
var output = RDomCSharp.Factory.BuildSyntax( root.RootClasses.First());
// For testing, force chhanges through secondary mechanism
var initialCode = File.ReadAllText(fileName);
var newCode = initialCode
.Replace("var ret = lastName;", "string ret = lastName;")
.Replace("var x3 = x2;", "int x3 = x2;")
.SubstringAfter("Walkthrough_1_code\r\n{\r\n")
.SubstringBeforeLast("}")
;
Assert.AreEqual(newCode, output.ToFullString());
}


Here, in addition to setting IsImplicitlyTyped to false, I set DisplayAlias to true. Normally, this is set when the code is loaded based on whether the alias is used. Since you’re changing how the type is managed, you also have to request that you want to use the alias.

RoslynDom Update 1.0.10-alpha

 

After reviewing the changes in 1.0.10-alpha I think trying to expand on the features in the context of the update document is not realistic. The document has some minor updates in the documents folder, and I’m archiving the update document for Updates 1.0.1-alpha to 1.0.10-alpha and starting a new document for 1.0.11-alpha. I’ll highlight the changes here in the context of goals accomplished. This has been an enormous leap and I now know where the last five or six weeks of my life have gone.

Language independence

There are three vertical slices to the overall RoslynDom.

  • The interfaces, which I’ll discuss separately with the unrealistically broad goal of being somewhat platform independent and supplying feature based access to the RoslynDom silo. To the extent you can, ignore the interfaces until I discuss them further as they might give you a headache.
  • RoslynDom itself, which is a language independent representation of your source code that is designed from the perspective of .NET and the .NET Compiler Platform, Roslyn. There is significant divergence from the .NET Compiler Platform when it aided the goals
    • Mutability
    • Layout independent format
    • Language independent format
    • Simple access
    • SameIntent support
    • Support for comments and XML documentation as first class citizens
    • Support for compiler directives as first class citizens
  • RoslynDomCSharpFactories, C# factories to load RoslynDom and recreate a .NET Compiler Platform, Roslyn, SyntaxTree. Loading and unloading is via the SyntaxTree to allow parsing and consistent structures.

Each language element, such as a method, is represented by a composed interface (IMethod), a language independent RoslynDom class (RDomMethod) and a C# specific factory (RDomMethodTypeMemberFactory). Each factory has CreateFrom… and BuildSyntax… methods. CreateFrom… methods create RoslynDom entities and BuildSyntax… methods recreate syntax elements for the SyntaxTree.

Dependency Injection

The groundwork for extensibility is in the dependency injection approach to retrieving factories. Since factories instantiate RoslynDom entities and recreate syntax, they are the key player in modifications and extensions. Since they interact to build RoslynDom and SyntaxTree entities, their retrieval is crucial to an extensibility story.

One known extensibility story is a Visual Basic factory. It seems likely possible that user stories for building RoslynDom trees from scratch as an easier way to build SyntaxTrees from scratch will want to tie into extensibility, and therefore a raw helper factory might make sense. I am hoping that more ambitious factories like VB6 can also be created.

There has not yet been any testing of extensibility and more work is required in refactoring the BuildSyntaxHelper methods.

Comments, Vertical Whitespace and XML Documentation Comments

The .NET Compiler Platform, Roslyn, places all whitespace and all comments into language trivia attached to the first token following where the trivia should appear.

Of necessity I use comments for “public annotations” (in earlier versions) which provide information for RoslynDom clients. This is entirely separate from the private annotations the .NET Compiler Platform, Roslyn provides. Also, for any use you will have, XML Documentation (also called Structured Documentation because the use of XML is compiler dependent) should be available on the language element (such as the class or method) it belongs to. Similar arguments raise directives to being first class citizens.

There are four levels where vertical whitespace and comments can logically occur: file, stem, type and code (method or property). Each of these has a MembersAll property that include comments and vertical whitespace, as well as appropriate code elements. The Members property includes all code elements except comments and vertical whitespace.

Structured documentation is extracted and placed on the corresponding element. At present, you access the XML because breaking this into a true structure is a lower priority because I don’t have user stories.

Horizontal Whitespace

Earlier versions of RoslynDom were very heavy handed in formatting. This version manages horizontal whitespace. About 25-30% of the code in the factories is now dedicated to managing horizontal whitespace. Three weeks, three redesigns, and a few tears went into this, but the current approach appears solid.

Report Hierarchy and ToString()

A ReportHierarchy method allows better information about the RoslynDom tree, particularly in the immediate window. More work will go into this, so do not take a dependency on the current structure you don’t want broken.

Added Statement support

There are six main levels of elements in your code base: file, stem, types, code container, statement and expressions. Previous versions of RoslynDom supported only file, stem, types and code containers. This version supports a variety of statements.

Statements are logically nested in code blocks, particularly the code blocks of conditional (if) and looping statements.

Expressions are minimally supported – RoslynDom uses conditions and assignment expressions without breaking them down or understanding them. I’m not sure whether it ever will. I have compelling user stories for understanding statements and statement parts (see the walkthroughs for one example). I do not yet have compelling user stories for breaking down expressions. If the only user story is intelligent VB.NET/C# conversions, support may be minimal.

Added Ancestors and Descendants

You can now query the ancestors and descendants of RoslynDom trees. See the walkthroughs for an example of why you might find this interesting.

Interfaces made non-immutable (mutable)

The RoslynDom tree is mutable. This is because of my intended usage and because I think one of the things an alternative to the .NET Compiler Platform, Roslyn is a mutable alternative. I absolutely agree that the .NET Compiler Platform, Roslyn should be immutable – it’s a compiler structure. However, this pretty much forces a rewriter for any non-trivial changes to the SyntaxTree. I believe there will be scenarios where it’s much easier to load into RoslynDom, do interrogation and mutating in that structure, then output to a new .NET Compiler Platform, Roslyn SyntaxTree.

In my initial vision, the interfaces were immutable (IMethod) and the RoslynDom implementation was mutable (RDomMethod). This proved impractical because of excess casting for mutations. My new vision is that if there’s a need for an immutable set of interfaces, the current set will inherit from the immutable set.

As an implication, and allowing for errors, if something is not-mutable in the interfaces, such as the RDomLists, they aren’t supposed to be changed.

There’s still a lot of work to do

GitHub lists known issues. The next version or two will be clean up and documentation improvements.

Following that I’ll plug the holes of the most important language features I’m not yet supporting. These include regions, lambdas and async because they are hard, and side cases like destructors and explicit operators.

I want to solidify a single file before I work across multiple files. Multiple file usage will make the underlying model much more useful and allow more interesting interrogation of non-mutated RoslynDom structures.

The biggest help I need right now are user stories, even vague ones, and failing unit tests – particularly if it crashes RoslynDom. Of course, if you’d like to help further please be in touch. If you want to fork the code, it would be lovely to see what you’re doing.

This is still a very early release. Everything is up for change.

I will try to keep the NuGet release from getting as out of date as it has been for the last month.

Refactoring Unit Tests

Llewellyn Falco and I paired on an introduction to his AcceptanceTests tool. I really like that tool for evaluating objects during testing in an easy, flexible and evolutionary way. It’s a great tool, but that’s not what this post is about.

Rob Hughes (@rhughesjr ) heard via Twitter that Llewellyn and I also refactored a bunch of RoslynDom tests to remove redundant code, and asked that I do a blog post about this aspect of our pairing. That’s what this post is about.

I wrote this about some later refactoring that Llewellyn inspired – so he should get all of the credit and none of the blame. I don’t think there is anything groundbreaking here. Just a detailed walkthrough of refactoring a set of unit tests, along with the logic behind the changes.

Removing redundant code from unit tests

When I write normal, non-unit test code, I think about refactoring common code from the very beginning. A lack of redundancy and flexibility/extensibility are primary forces I think about in software.

But not in unit tests. I believe that unit test creation should be a bit more like an amoeba eating up everything it can touch. A rigid shape caused by code reuse can hide a reduction in logical coverage and in LOC coverage. So, when Llewellyn and I began there were almost no helper methods in the RoslynDom tests.

RoslynDom is very simple in goal – load C# code into a language agnostic structure, allow interrogation and changes to the structure (yes, it’s mutable), and output C# code which looks like the original code. Oh, and you can ask whether two code structures have the same intent.

Because it does a few things across a mid-size number of different elements, there are a lot of very similar tests.

I believe it is best to discover where your tests are redundant, by refactoring them at a later date, and after you have a big pile of tests.

I do not recommend strategizing ahead of time about how to maximize code reuse in unit tests (been there, done that). It’s the only place in your code where I think the copy-paste-fix-differences cycle is OK. I’m referring only to strategizing and designing around code reuse too early.

Strategizing isolation of unit tests very early in the process is extremely helpful. I would say it is necessary, but if you have no tests, I don’t really care if you isolate your first ten tests.

So, why ever remove the redundant code?

Too often unit tests are a static pile that rots during our application development process. If we’re deeply committed, we fix all the broken tests. If we aren’t tests are disabled or removed as the schedule requires. Regardless, unit tests generally become rotten and stinky.

If tests aren’t isolated, rotting tests may become impossible to run. In the days before mocking, I saw a team toss >1,000 tests because of a database change. But RoslynDom tests are isolated because of the nature of the problem.

Beyond maintaining the ability to run your tests, the universal problem is that rotting tests become impossible to read and understand. Your unit tests are the best view into your system. It’s why we have test names that explain why we wrote the test (RoslynDom test naming is mediocre and adequate, not brilliant).

As you project the technical changes in the rest of this post to tests onto your own projects, think about how to increase clarity in what each test is accomplishing.

OK, already! What are the changes?

Here’s a simple test before refactoring

[TestMethod, TestCategory(SimpleAttributeCategory)]
public void Can_get_attributes_on_class()
{
var csharpCode = @"
[Serializable]
public class MyClass
{ }
"
;
var root = RDomFactory.GetRootFromString(csharpCode);
var class1 = root.Classes.First();
var attributes = class1.Attributes;
Assert.AreEqual(1, attributes.Count());
Assert.AreEqual("Serializable", attributes.First().Name);
}


RoslynDom has almost 600 unit tests. I like test categories and use constants to avoid mistyping them.

There are eight tests nearly identical to this that change what the attribute is placed on (class, structure, method, parameter, etc), and thus different code strings. There are also variations with different numbers of attributes. So, clearly there is a lot of redundant code (about 32 tests).

The two lines that retrieve the attributes are problematic. They will be different for every test. That’s a job for our mild-mannered super-hero: the delegate!

Refactoring the test part of the code with a delegate results in this method:


private static void VerifyAttributes(string csharpCode,
Func<IRoot, IEnumerable<IAttribute>> makeAttributes,
int count, params string[] names)
{
var root = RDomCSharp.Factory.GetRootFromString(csharpCode);
var attributes = makeAttributes(root).ToArray();
Assert.AreEqual(count, attributes.Count());
for (int i = 0; i < attributes.Count(); i++)
{
Assert.AreEqual(names[i], attributes[i].Name);
}
}


The things that change between tests are the input code (csharpCode), how the attributes are retrieved (makeAttributes), the count of attributes expected (count) and the expected parameter names (names).

The test calls this method with:


[TestMethod, TestCategory(SimpleAttributeCategory)]
public void Can_get_attributes_on_class()
{
var csharpCode = @"
[Serializable]
public class MyClass
{ }
"
;
VerifyAttributes(csharpCode, root => root.Classes.First().Attributes,
1, "Serializable");
}


 

The value of this call isn’t removing five lines of code – it’s making it more clear what those five lines of code did.

This change simplified 32 tests and made them more readable.

All tests aren’t that simple


The next set of tests looked at attribute values. The initial test was:


[TestCategory(AttributeValuesCategory)]
public void Can_get_attribute_values_on_class()
{
var csharpCode = @"
[LocalizationResources("
"Fred"", ""Joe"", Cats=42)]
[Name("
"KadGen-Test-Temp"")]
[SemanticLog]
public class MyClass
{ }
"
;
var root = RDomFactory.GetRootFromString(csharpCode);
var attributes = root.Classes.First().Attributes;
Assert.AreEqual(3, attributes.Count());
var first = attributes.First();
Assert.AreEqual("LocalizationResources", first.Name);
Assert.AreEqual(3, first.AttributeValues.Count());
var current = first.AttributeValues.First();
Assert.AreEqual("LocalizationResources", current.Name);
Assert.AreEqual("Fred", current.Value);
Assert.AreEqual(LiteralKind.String, current.ValueType);
current = first.AttributeValues.Skip(1).First();
Assert.AreEqual("LocalizationResources", current.Name);
Assert.AreEqual("Joe", current.Value);
Assert.AreEqual(LiteralKind.String, current.ValueType);
current = first.AttributeValues.Last();
Assert.AreEqual("Cats", current.Name);
Assert.AreEqual(42, current.Value);
Assert.AreEqual(LiteralKind.Numeric, current.ValueType);
Assert.AreEqual("Name", attributes.Skip(1).First().Name);
Assert.AreEqual("SemanticLog", attributes.Last().Name);
}


 

I doubt you can glance at that and understand what it does.

One approach would be to pass a complex data structure to the previous Verify method. I could probably have created something slightly readable with JSON, or XML literals if I was in Visual Basic. But unit tests demand a KISS (Keep it Simple Silly) approach.

If the VerifyAttributes method returns the IEnumerable of IAttribute it’s already creating, the first five lines (and a couple of others) can be replaced with:


var attributes = VerifyAttributes(csharpCode, 
root => root.Classes.First().Attributes,
3, "LocalizationResources", "Name", "SemanticLog")
.ToArray();



Making it an array simplifies accessing individual elements.

For the rest of the test, it makes sense to apply the same refactoring approach that worked on attributes. But here, there’s a name, a value, and a literal kind. Again, one approach is a complex structure, but a simpler approach is to test the count and return the IEnumerable of IAttributeValue for more testing:


private IEnumerable<IAttributeValue> VerifyAttributeValues(

IAttribute attribute, int count)

{

var attributeValues = attribute.AttributeValues;

Assert.AreEqual(count, attributeValues.Count());

return attributeValues;

}



 

An additional method simplifies the testing of individual attribute values:


private void VerifyAttributeValue(IAttributeValue attributeValue, string name, object value, LiteralKind kind)

{

Assert.AreEqual(name, attributeValue.Name);

Assert.AreEqual(value, attributeValue.Value);

Assert.AreEqual(kind, attributeValue.ValueType);

}



 

Calling these methods is a great opportunity for named parameters. Take a minute to compare the readability of this code to the same test at the start of this section (and yep, I wish I’d also used named parameters for the VerifyAttributes calls):


[TestMethod, TestCategory(AttributeValuesCategory)]
public void Can_get_simple_attribute_values_on_property()
{
var csharpCode = @"
public class MyClass
{
[Version(2)]
[Something(3, true)]
public string foo {get; set; }
}
"
;
var attributes = VerifyAttributes(csharpCode,
root => root.Classes.First().Properties.First().Attributes,
2, "Version", "Something")
.ToArray();
var attributeValues = VerifyAttributeValues(attributes[0], count: 1)
.ToArray();
VerifyAttributeValue(attributeValues[0], name: "", value: 2, kind: LiteralKind.Numeric);
attributeValues = VerifyAttributeValues(attributes[1], count: 2)
.ToArray();
VerifyAttributeValue(attributeValues[0], name: "", value: 3, kind: LiteralKind.Numeric);
VerifyAttributeValue(attributeValues[1], name: "", value: true, kind: LiteralKind.Boolean);
}


Does that really fit every circumstance of the area you’re testing?


Rarely will there be such a large number of tests doing such trivial comparisons. In this same test file/test topic, there are also tests of passing types, instead of literals, to attributes. This only appears three places in the file:


[TestMethod, TestCategory(AttributeValuesCategory)]
public void Can_get_attribute_value_of_typeof_identifier_only_on_class()
{
var csharpCode = @"
[Test(TypeTest = typeof(Foo))]
public class MyClass
{ }
"
;
var attributes = VerifyAttributes(csharpCode,
root => root.Classes.First().Attributes,
1, "Test")
.ToArray();
var current = VerifyAttributeValues(attributes[0], count: 1)
.First();
Assert.AreEqual(LiteralKind.Type, current.ValueType);
var refType = current.Value as RDomReferencedType;
Assert.IsNotNull(refType);
Assert.AreEqual("Foo", refType.Name);
}


 

Honestly, I might not bother refactoring this if it was in a test file that was full of variations and refactoring opportunities with more payback. But in this nice clean test file, it’s jarring.

Using a different name, rather than an overload, clarifies that something different is being checked:


private static void VerifyTypeOfAttributeValue(IAttributeValue current, string name)

{

Assert.AreEqual(LiteralKind.Type, current.ValueType);

var refType = current.Value as RDomReferencedType;

Assert.IsNotNull(refType);

Assert.AreEqual(name, refType.Name);

}



 

Making the call:


[TestMethod, TestCategory(AttributeValuesCategory)]

public void Can_get_attribute_value_of_typeof_referenced_on_class()

{

var csharpCode = @"

[Test(TypeTest = typeof(DateTime))]

public class MyClass

{ }

"
;

var attributes = VerifyAttributes(csharpCode,

root => root.Classes.First().Attributes,

1, "Test")

.ToArray();

var current = VerifyAttributeValues(attributes[0], count: 1)

.First();

VerifyTypeOfAttributeValue(current, name: "DateTime");

}



 


Yes, you could make it smaller


The actual change with all these refactorings was about 130 lines of code, 860 to 730 vertical lines in this test class. Because the same set of tests were repeated multiple times, and the C# code I’m testing is so similar for different contexts, I could have reduced the code much further, maybe even to half the size.

But reducing the code size in unit tests beyond the point of maximum clarity is not helpful. The main driving forces for tests are that they be stand-alone and readable. Each test in the resulting file is stand-alone and more readable than without the refactoring. Each should be less than a screen in size, but once you reach this point, clarity trumps size.

Write clear verify tests and allow the reader to correctly assume that each verify method tests the parameters passed, and nothing else.

And then there’s change…


RoslynDom does a handful of things. One of the tricky things that is not tested by these unit tests is round-tripping of attributes, which has some tests in another part of the test suite.

While I’m curious how well the code in this test file runs, I know there are presently some low-priority issues roundtripping attributes. Before creating the common code, it would have been a lot of bother to experiment with round-tripping to see how serious these issues might be. After the changes, I just need to add a couple lines of code to the VerifyAttributes method.

When I actually did this, I got a lot of the expected messages. I know I read any kind of attribute layout, but am opinionated (for now) on outputting as separate attributes:

Result Message:
Assert.AreEqual failed. 
Expected:<
[Serializable, TestClass]
public interface MyInterface
{ }>. 
Actual:<
[Serializable]
[TestClass]
public interface MyInterface
{ }>.



What was unexpected was that 3 tests – those typeof tests –crashed on outputting the code. I get excited anytime I find and fix a problem, because it’s quite challenging to test the myriad of code possibilities that RoslynDom is intended to support.

I liked this test enhancement so much I left it in. I have a rule that all tests that crash RoslynDom should result in new tests – so I had to work it into the test suite one way or another. I added a Boolean value to the VerifyAttributes method to skip the BuildSyntax test where I know it will fail just because of attribute layout.

Here I used a refactoring trick.

I added the new Boolean value as the first parameter – even though that’s a sucky place for it.

I did a replace of “VerifyAttributes(csharpCode,” with “VerifyAttributes(false, csharpCode,” with a Replace In Files for just the current document so I could check the changes. That was good because I initially had a space at the end, which missed occurrences where I wrapped the delegate to the next line.

Once everything built with the new parameter, I refactored with a signature change to put the Boolean where I wanted it, and then changed the Boolean value to true on the tests where I wanted to skip the assertion that the input and output match. I always call BuildSyntax to ensure it doesn’t crash, but I don’t expect to roundtrip the code perfectly when attributes are combined (at present).

This will also make it dirt simple to find these tests if/when I decided to support round-tripping multiple layouts of attributes. I’ll just ignore and then remove the parameter.

Take a look at your tests and see whether you can make some of them easier to understand with some tactically applied helper methods.

What I learned about coding against the .NET Compiler Framework this week…July 24, 2014

I don’t know if I’ll do this every week, but this week I hit two spots of the .NET Compiler Platform API quicksand. I did not get out of either alone, so wanted to share what I learned.

ToFullString()

I struggled fantastically with creating code for XML documentation. Run the Roslyn quoter against a simple comment and you’ll get the gist of it.

For my work with RoslynDom I need to go both ways after modifying the documentation:

- Code -> Compiler API (syntax tree) -> RoslynDom items

- RoslynDom items -> Compiler API (syntax tree) -> to code

-

The first works great. Grab the symbol and you can grab the documentation:

Symbol.GetDocumentationCommentXml()

This gives you the XML as a string. Just load it as an XDocument and run as much LINQ to XML as you like. All is good.

But then… I needed to recreate the syntax tree. I really, really felt I should be able to build it up from primitives. After a few hours banging my head against that wall, I had to accept the core rule of …

The .NET Compiler Platform is a compiler, what it does really, really well is parse code.

So, even though it made me feel dirty, I wrote out the XML to a string, split the string into lines, iterated over the lines inserting the three slashes, and asked the SyntaxFactory to parse it. If you’re struggling to build something, see if you can parse into what you need.

In this particular case, it failed. I mean I had the output and it looked good, but the first three slashes were missing and the end of line at the end was missing. Specifically, I mean when I wrote it out in the immediate window these were missing. Crap.

Happily I have friends. Anthony D Green (ADG) on the team pointed out that I wasn’t using ToFullString(). At various points in working with the API, ToString() may do surprising things – working too hard or just getting nuts on your behalf. Perhaps someone somewhere needs the stripped version.

If you’re looking at a string output from the API, check it also with ToFullString().

The Formatter is picky, and EndOfLineTrivia requires \r\n

The .NET Compiler Platform is designed, and massively tested, with code that can happen in the real world from its own parsing. When you build trees, there is a large number of ways you can mess up that could never happen through parsing. I’d say infinite, but my son is an astrophysicist and doesn’t let me say things are infinite.

In my case, I naively thought that EndOfLineTrivia would understand that it was supposed to, well, you know, output an end of line. I did not anticipate that I would also need to pass a \r\n. I also did not anticipate that it would silently create an object that would later cause an exception – deep in the heart of the Formatter API. This time Balaji Soundrarajan did a little telepathic debugging and guessed that I’d failed to include \r\n. Thanks to him and all the folks that took a look at that one!

Updates to RoslynDom 1.0.9 Alpha

New Parent property on all items

IDom now contains a Parent property of IDom type

In any tree things can become, well interesting, if nodes appear in more than one location. This is particularly damaging in a tree that takes characteristics from context – which happens with naming (namespaces and nested classes) in the .NET class model. Thus, by intent, no item may appear in more than one location in the tree.

When a member is cloned, its parent is not copied with it. Also, parent and parent properties are not used in determining same intent.

Real-time Namespace property

Previously, Namespace was stored from the symbol when the instance was created. Because Namespace is contextual, this was incorrect. Namespace is now calculated from the parent hierarchy when the namespace is requested for all classes except RDomReferencedType. This resulted in some changes in Namespace results, including the result from

Namespace testing.Foo

Which previously returned Foo and now returns testing.Foo.

The Namespace in RDomRefernecedType is the namespace of the type being referenced, so is still retrieved from the symbol on load.

AddOrMoveMember and RemoveMember methods

Methods to add members to containers have been added to new IRDomStemContainer, IRDomTypeContainer and IRDomCodeContainer interfaces.

As discussed under the heading “New Parent property on all items,” IDom items may not appear in more than one location in the tree. The AddOrMove semantics reflect this. I actually think moving will be a rare task, but if you accidently add an item to a new location in the tree, RoslynDom will remove it from the prior location and I wanted naming to clarify this.

I may add an “AddCloneOfMember” to simplify the process of cloning a member and adding it to a new location after changes. This is the anticipated use case.

ICodeContainer and ICodeMember interfaces

There are new ICodeContainer and ICodeMember interfaces. Support for intra-member features (code) remains almost non-existent in this version.

RawItem and OriginalRawItem semantic changes

RawItem and the new OriginalRawItem on the IDom interface represent the underlying data in an agnostic way. IDom is agnostic on mutability so there may be future implementations where RawItem and OriginalRawItem are always the same. I want the semantics to be clear that RawItem is the best current capturing of the tree, and OriginalRawItem is the original unchanged item. This intentionally implies that the original must be maintained.

TypedSyntax and OriginalTypedSyntax are the RDom implementations of these generalized ideas.

AddMember method added to RDomStemContainer and RDomBaseType

To support mutability, AddMember methods were added to these two base classes. This makes the ability to add types and type members available to appropriate types, namespaces, and the root.

Changed return of PublicAnnotationList.GetValue(string key)

Previously this returned the default value, which blocked access to other values. It now returns the PublicAnnotation. The default value remains accessible by GetValue(name, name).

Changed PublicAnnotation to a Class

PublicAnnotation was a struct. This was the only struct in the system and I felt the value/reference semantic difference would be detrimental to maintenance. As part of this, I removed the equality testing and added a SameIntent method.

Added IHasSameIntentMethod interface

Another characteristic interface was added for the SameIntent methods. This is for consistency with other characteristic interface usage.

Moved SameIntent to a subsystem in RoslynDom.Common

This code may eventually run with a DI, but for now, if the interface data matches, they match.

Changed SameIntent method type parameter

Previously the SameIntent method appeared on the strongly typed IDom<T> interface and could only be called on items of the same type. This was overly restrictive, so the method was changed to have a local strongly typed parameter, constraint only to be a class. Comparing different IDom types of the current implementations will always return false, although it is possible that a derived class could be created that had different behavior, but the same intent, as one of the existing implementation classes, and could therefore return true as the same intent. This was also done to support scenarios where the type is not known, such as public annotations that might be IDom types.

Changed inheritance semantics of SameIntent() method

The previous inheritance semantics of the SameIntent method were to directly override the public SameIntent method. This method is no longer virtual. Instead override the CheckSameIntent protected method. Be sure to call the base CheckSameIntent method for correct behavior.

SameIntent and names

Type members (fields, properties, methods and enum) do not include outer name when considering same intent.

Stem members (types, namespaces) do not include namespace/qualified name in same intent.

Added IHasLookupValues interface

Added this interface to reduce dependencies in an upcoming project.

Virtual Matches method added to IDom

Immediately this allows CheckSameIntentChild to better find the other child to compare to. It also provides a generalized way to find items in a list.

Changed name of RDomTypeParameter. HasReferenceTypeConstraint

Was previously HasReferenceConstraint. Changed for consistency. Also changed ITypeParameter

Changed name of MemberKind, StemMemberKind and LiteralKind

The suffix “type” is confusing. Switched these enums and property names to “kind”

BuildSyntax

Implementation of syntax recreation from changed nodes is begun, not complete.

Internal cleanup

- Separated RDomBase classes into separate files

- Created SameIntentHelper

- Changes to how IHasNamespace properties are stored and used

- IHasNamespace moved from IStemMember to IType and INamespace

- IUsing now includes IStemMember

- StemMembers property of Namespaces and Root now include usings

- Fixed some bugs in RDomField attributes

Performance in Tracing

 

I talk about tracing in other posts, including here and here. I talk about semantic tracing here. I also have a Pluralsight video on tracing with ETW and EventSource here.

Bill Chiles has a great article here. His article is based on the Roslyn team’s experience. The Roslyn compilers needed massive tuning so that the managed compilers would have similar performance to the older unmanaged compilers. His article is great on overall application performance, and I use it here in relation to tracing.

Bill lists four issues, and then a few miscellaneous items:

  • Don’t prematurely optimize – be productive and tune when you spot problems.
  • Profiles don’t lie – you’re guessing if you’re not measuring.
  • Good tools make all the difference – download PerfView and look at the tutorials.
  • Allocations are king for app responsiveness – this is where the new compilers’ perf team spent most of their time.

When tracing adversely impacts performance, it’s often due to I/O or I/O binding. I/O issues occur when you are writing to trace storage on the main thread – on Windows, you can avoid this by using ETW. In .NET, you use ETW by using EventSource. The rest of this article covers adverse performance impacts of tracing unrelated to I/O issues.

Premature optimization and profiling

Outside tracing, I almost always agree with the recommendation that premature optimization wastes programmer time and results in unnecessarily complex applications.

But tracing is different. Optimizing tracing is not premature and considering performance throughout your discussions of trace strategies makes sense. If your application is well traced, you have a lot of tracing calls. You never want a programmer to consider whether adding a trace call will hurt performance.

One of the goals of tracing is to discover problems as quickly as possible – so you don’t want to implement a strategy you’re afraid to leave on in production.

Tracing is part of your profiling strategy and tracing that perceptibly slows your application in production may lead you astray in evaluating profile results.

Profiles don’t lie. Knowing the performance metrics of your application with tracing turned off and with various sets of traces turned on via level (error, warning, information), keywords, etc. is important. Unless I/O is slowing down your traces, you’ll find that tracing is a very, very small percentage of your application’s effort. If it’s a significant portion, your tracing strategy is flawed and you need to fix it.

Because the impact is so small, it’s unlikely that you can use profiling to improve performance of your tracing.

Assuming a fast trace infrastructure (ETW with EventSource or out-of-proc SLAB in .NET) the goal of improving trace performance is not to improve overall application performance. The goal is to provide confidence that you can turn tracing on whenever you want, or ideally leave a set of traces on at all times in a “flight recorder” mode. “Flight recorder” mode means traces are recorded in a rolling fashion that’s made permanent when an interesting event happens.

You always want tracing to be as fast as possible, as long as it’s not causing undo complexity or extra effort.

Happily, with ETW and SLAB using ETW you can have high performance tracing. And happier still, performance considerations will improve your tracing design by pushing you more firmly into the semantic tracing camp.

Performance considerations for tracing

Performance considerations make for better tracing. Here are a few examples:

  • Isolate all trace calls behind semi-permanent signatures with no artifacts of the underlying technology
    • Allows evolution in response to technology improvements (such as implementing an out-of-proc strategy
    • Also helps discoverability
  • Make these methods very granular and very specific to (and descriptive of) the action you are tracing
    • Allows more granular enabling of trace events
    • Less code to run within individual trace methods
    • Also documents application actions
    • Also provides consistency in details like level, keywords and channel
    • Also simplifies usage (IntelliSense)
  • Use strongly typed parameters for these methods
    • Avoids boxing (heap allocations and resulting GC load)
    • Also simplifies usage (IntelliSense) and documents actions
  • Avoid large structs as parameters
    • Avoids copying on the stack
    • Also simplifies usage (IntelliSense) and documents actions
  • Avoid concatenating strings and string.Format() as well as Concat(), Split(), Join() and Substring()
    • Avoids allocations
    • Also results in a trace that can be interpreted without parsing
  • Avoid retrieving data or any expensive operations
    • Obviously, avoids spending that time
    • Also ensures trace is just recording current conditions
  • Get to know the information provided by the trace system for free
    • Avoids tracing things that are already available
    • Also allows for fewer trace calls and a simpler system
  • Consider having tracing on during testing and not using DI (Dependency Injection) for tracing
    • Avoids running DI code, and possibly allocations
    • Also simplifies your application
    • Also allows programmers to use tracing during initial debugging
    • Also gets programmers in the habit of using traces

Semantic tracing is the style of tracing that easily fulfills these goals. Semantic tracing is a style, it can be used on any platform in any language.

Semantic tracing fills many of these performance focused goals

Semantic tracing is by its nature strongly typed. This has enormous benefits in clarity of purpose and IntelliSense support. It can also have a significant impact on performance because value types are not boxed. And, by isolating your tracing code, additional optimization can be done later, and only if needed. For example, you may need to do extra work to avoid the object overload of EventSource.WriteEvent(). But this code is a pain and adding it in all cases would be a premature optimization.

If you are using .NET and you are not yet using EventSource and ETW, no other performance improvement will be as great as moving to ETW. You can use ETW when you use EventSource directly without alternate listeners, and when you use out-of-proc SLAB (Semantic Logging Application Block from Microsoft Patterns and Practices which is an enhancement of EventSource). Isolating your calls allows you to make the change to ETW at your convenience.

I talked about semantic tracing here. The rest of this article explores details in Bill Chile’s article from a semantic perspective and assumes you are tracing outside the current process (EventSource or out-of-proc SLAB in NET).

Avoiding boxing allocations with semantic tracing

Heap allocations themselves have a tiny impact. The bigger problem is that each allocation must eventually be garbage collected. Garbage collection is also relatively fast, but it occurs in spurts. In most normal application code, allocations occur at a rate that can be smoothly garbage collected and you don’t need to worry about this issue, unless profiling shows that you have excessive allocations or garbage collection.

Since a well-traced application has lots and lots of trace calls, any allocations you have will occur a large, possibly massive, number of times. Avoiding unnecessary allocations in tracing is a very good thing.

Bill shows a great example of the allocation problems tracing can cause with unnecessary boxing:

public class BoxingExample
{
public void Log(int id, int size)
{
var s = string.Format("{0}:{1}", id, size);
Logger.WriteLine(s);
}
}



I see five heap allocations in this code. The string s is allocated. The two integers are boxed to object to pass to the Format() method. A string is then created for each to perform the concatenation.



These allocations occur whether or not tracing is turned on.



Replace this code with the following:



public class SemanticLoggingExample() : EventSource
{
public void ProcessStarted(int ProcId, int SampleSize)
{
WriteEvent(ProcId, SampleSize);
}
}



This code has zero allocations.



It is also more readable, discoverable, IntelliSense friendly.



Boxing and logging with strings



Semantic tracing discourages the use of strings in tracing. If you’re trace technology requires them, you can create the strings within your semantic trace method. The advantage is that you can avoid the resulting allocations when tracing is turned off, and you can replace your technology with one that does not require string creation when it’s available.



EventSource() in .NET does not require any string creation. You might want a message reported to consumers of your trace, but you can do this with the Message property of the Event attribute (the message parameter to the constructor). This is similar to a format string and is included in the manifest for the events. The ETW consumer application can build the string for the human user; the common consumers already use this string to display the message to the user. Your application does not ever build this string.



This saves I/O load, CPU cycles, boxing, and GC load.



As a side bonus, the Message property of the Event attribute can be localized. See my course or the ETW specification.



Bill’s advice on using ToString() prior to calling String.Format() is good when you are using the String.Format() method in .NET. But if you are building strings as arguments in calls to your trace system, you are almost certainly doing something wrong. Instead, maintain value types as value types throughout as much of the tracing pipeline as possible, and always to the point you can check whether tracing is turned on. And then, use the ToString() trick . It’s a cheap improvement since extra ceremony at that point is not distracting – the point of the method is to create the trace.



Avoiding other allocations



Bill’s article is a great source of other ways to avoid allocations and other performance tips. These issues are rare in tracing, but worth considering when they occur:



  • GetHashCode: cast enums to the underlying type
    • You might use a hash code as part of a strategy to hide sensitive information; hashing an enum might rarely be part of this, and it’s easy enough to do the cast in that scenario
  • HasFlag: boxes – use bitwise comparison in frequent calls
    • HasFlag with tracing is almost certainly in a section of a semantic method that doesn’t run unless tracing is turned on, and at that location, the extra clarity of HasFlag is probably not that important
  • String operations – watch Format(), Concat(), Split(), Join() and Substring() in frequent calls
    • Avoid strings in tracing as much as possible, and at least ensure ensure it’s in a section of a semantic method that doesn’t run unless tracing is turned on
  • Unnecessary literal allocations – see Bill’s WriteFormattedDocComment sample
    • If you have literals – like a table name – create it once
  • StringBuilder – still requires an allocation
    • Try to avoid creating any string that’s complex enough to make string builder appropriate
    • If you think you need a string builder in tracing, at least ensure it’s in a section of a semantic method that doesn’t run unless tracing is turned on.
  • Clever caching – Bill’s example is caching a StringBuilder
    • Avoid creating new object instances in your tracing
  • Closures in lambdas – a class is generated on compile and an instance allocated
    • Avoid lambda closures in tracing
  • LINQ – in addition to common lambda closures, there are extra allocations for delegates
    • Avoid LINQ in tracing
  • Inefficient async caching – cache the task, not just the result
    • Use tracing that is fast and let the technology (ETW) get off the thread – avoid async in tracing. You can’t build an async strategy that is as fast as the one Microsoft built with ETW (assuming .NET)
  • Dictionaries – often mis-used when simpler structures would work
    • You might use a lookup to sensitive information in a few cases (keep sensitive information out of traces, regardless of technology) and use a dictionary only if there will be many lookups and the list isn’t small
  • Class vs struct – classic space/time tradeoff
    • Consider this in tracing calls, generally pass specific data items rather than large structs, although this can undermine the ideal of semantic tracing
  • Caching without a disposal plan (like a capacity limit) – avoid this because it’s also called a memory leak
    • This can happen if you create a lookup for sensitive information, create a cache plan


You can read Bill’s article for more information on each of these issues.



Boxing and the EventSource.WriteEvent() object array overload



Boxing and an allocation occur whenever you pass a value type as an object parameter, so anytime you avoid calling an object overload it’s a good thing. But like all other methods, EventSource.WriteEvent() method has a finite number of type specific overloads. The problem is, creating additional overloads for WriteEvent() is a rather ugly operation requiring an unsafe code block.



You can determine if you’re using the object parameter overload through IntelliSense or Go To Definition.



I think you can avoid crating this extra overloads unless you either know you’re hammering a particular overload pattern with a very large number of calls, or you know from profiling that you have an allocation or GC problem. In that case, you should absolutely create the alternate overload to avoid the object parameter array version, to avoid boxing the value types in the call to a semantic trace.



In my Pluralsight video, I discuss that creating your own overload has about a 2 fold performance impact. That was a casual test that just measures the boxing in the worst case (a very simple call) and probably didn’t run into generation 2 GC blocking. I’m not confident that I captured the full impact, but it is small for traces that happen less than hundreds (possibly thousands) of times per second.



See the ETW specification for information on building extra overloads.



Tracing on, tracing off



Relying on checks of whether tracing is turned on develops a sense that you don’t care about trace performance when tracing is turned on. I’ve even heard developers state that position. But there is no value to tracing until you turn it on!



If you’re using a slow trace technology that touches a resource on your main application thread (in .NET that’s every trace strategy I know of other than ETW and out-of-proc SLAB using ETW) every optimization in this article is trivial compared to the cost of touching the resource when tracing is turned on. Use strongly typed parameters to semantic trace methods and guard your semantic trace methods with a check of whether tracing is turned on, then ignore all the other optimizations until you can switch to a more efficient form of tracing.



If you’re using EventSource, the first thing the WriteEvent() method does is check IsEnabled(). So using IsEnabled() has a trivial advantage if you are just calling WriteEvent() from your semantic trace method. If you use the IsEabled() method and tracing is turned off, you avoid an extra method call. If you use the IsEnabled() method, and tracing is turned on, there are two method calls. Method calls are very fast, the difference is trivial.



Use IsEnabled() when:



  • You’re doing extra work before calling WriteEvent()
  • You’re passing a non-trivial structure to WriteEvent()
  • You’re using the object overload of WriteEvent()


Summary



If you’re not using semantic tracing with strongly typed parameters, move to it to improve your overall trace strategy and performance.



If you’re not using a high performance out-of-proc tracing infrastructure, improve trace performance by moving to one.



If you’re using a high performance semantic trace strategy, several further tweaks are simple to do. These improvements are especially important if they increase confidence in your ability to using tracing during production.

“The active Test Run was aborted because the execution process exited unexpectedly”

I just got this error when running a medium unit test suite (many hundreds of tests).

“The active Test Run was aborted because the execution process exited unexpectedly. To investigate further, enable local crash dumps either at the machine level or for process vstest.executionengine.x86.exe. Go to more details: http://go.microsoft.com/fwlink/?linkid=232477

Anyone want to guess the problem is?

It appeared in the output window, quite suddenly in an otherwise fairly happy and fairly TDD day.

Hmmm.

  • Delete existing TestResults and .SOU (my traditional voodoo).
  • Have a glass of wine (my traditional, calm down and try again).
  • Restart VS.
  • Reboot my machine.

No joy. Hmmm.

Can I tell you how much I do not want to investigate further through local crash dumps?

Let me see if I can run any tests at all. It really looks like MSTest is broken.

Hmmm.

  • Another test ran fine.
  • Let me run a block: yes.
  • Another block: no
  • Another block: yes
  • Another block: yes
  • Another block: yes
  • And so on…until I found just one test.

OK, I’ll debug it.

When debugging, rather than running the test – the app crashes with a stupid error that resulted in a recursive call and a stack overflow error.

Stack overflows can be really funny to diagnose. There’s no more air and the last gasp of the process is to shout something, anything, a final nonsense farewell…

Except if that error can ever physically get to a user, the phrasing could be better. I already submitted a request for a better message.

I’m posting here, because if you encounter this error do some combination of the following:

  • Run tests in debug mode
  • arrow in on one or more offending tests, then run in debug mode
  • There may be other ways this occurs – but it’s been reported at least twice with a stack overflow