Creating Strong-typed Metadata Classes

This post is about an aspect of the CodeFirstMetadata library. You can find out more about this library and where to get it here and here.

You can find out more about strong-typed metadata classes in this post.

You can find out about code-first (generalized, not Entity Framework) here.

This post talks about the two existing examples to explain how strong typing works in real code and to show how instances of these examples are created.

At present, in order to create a set of strong typed classes to solve a new problem you need to create a fairly messy set of classes. Feel free to ping me if you think you have a good problem or you want to extend the existing problems and I’ll help guide you. In the long run I want to automate that process, so I probably won’t document it until then.

Because part will be automated/generated, it comes in two parts. I’m currently combining them with inheritance, rather than partial classes, to make this code approachable for non-.NET programmers, and because virtual/override are simpler concepts.

These classes all derive from a common base class – CodeFirstMetadata<T> – to provide common features like naming. Below this are code element specific classes like CodeFirstMetadataClass<T> that help with the conversion. I may later replace this with a shallow hierarchy and interfaces, so don’t get dependent on this implementation.

For a semantic log, the class, the predictable part I’ll later generate looks like:

using System.Collections.Generic;
using CodeFirst.Common;

namespace CodeFirstMetadataTest.SemanticLog
{
// TODO: Generate this base class based on expected attributes
public abstract class CodeFirstSemanticLogBase : CodeFirstMetadataClass<CodeFirstSemanticLog>
{
public CodeFirstSemanticLogBase()
{
this.Events = new List<CodeFirstLogEvent>();
}

public virtual string UniqueName { get; set; }
public virtual string LocalizationResources { get; set; }

public IEnumerable<CodeFirstLogEvent> Events { get; private set; }

}

}



 



The manual changes I’ve made, which are by far the most complex I’ve needed so far are:



using System.Linq;
// TODO: Attempt to remove this line after generating base class

namespace CodeFirstMetadataTest.SemanticLog
{

public class CodeFirstSemanticLog : CodeFirstSemanticLogBase
{

private string _uniqueName;
public override string UniqueName
{
get
{
if (string.IsNullOrWhiteSpace(_uniqueName))
{ return Namespace.Replace(".", "-") + "-" + ClassName; }
return _uniqueName;
}
set
{ _uniqueName = value; }
}


public bool IncludesInterface
{ get { return this.ImplementedInterfaces.Count() > 0; } }

public bool IsLocalized
{ get { return !string.IsNullOrWhiteSpace(this.LocalizationResources); } }

public override bool ValidateAndUpdateCore()
{
var isOk = base.ValidateAndUpdateCore();
if (isOk)
{ return CheckAndUpdateEventIds(); }
return false;
}

/// <summary>
/// This is a weird algorithm because it numbers implicit events from
/// the top, regardless of whether other events have event IDs. But
/// while I wouldn't have chosen this, I think it's important to match
/// EventSource implicit behavior exactly.
/// </summary>
private bool CheckAndUpdateEventIds()
{
var i = 0;
foreach (var evt in this.Events)
{
i++;
if (evt.EventId == 0) evt.EventId = i;
}
// PERF: The following is an O<n2> algorithm, probably a better way
var dupes = this.Events
.Where(x => this.Events
.Any(y => (y != x) && x.EventId == y.EventId));
return (dupes.Count() == 0);
}
}
}



 



EventSource, and presumably any other log system, requires a unique name, and I want to help you create that. Also, whether there is an interface and whether the class is localized have a significant impact on the template, so I simplify access to this information.



Loading strong-typed metadata is an opportunity for validation of the model. I use this to provide unique numeric ids to each of the log events, which are needed by EventSource and potentially other log mechanisms.



Mapping Between Code-first and Strong-typed Metadata



A bunch of ugly Roslyn and reflection code maps between code-first and strong typed metadata. This is the code that drove creation of the RoslynDom library – directly hitting the .NET Compiler Platform/Roslyn API within this code was monstrous.



var root = RDomFactory.GetRootFromFile(fileName);
var cfNamespace = root.Namespaces.First();
var returnType = typeof(T);
var mapping = TargetMapping.DeriveMapping("root", "root", returnType.GetTypeInfo()) as TargetNamespaceMapping;
var mapper = new CodeFirstMapper();
var newObj = mapper.Map(mapping, cfNamespace);



  • cfNamespace is the RolsynDom root
  • T is the type to return – the strong-typed metadata
  • mapping derived data about the mapping of the target– just create it as shown
  • mapper is the class that does the hard work
  • newObj is the new strong-typed metadata object


In the end, you have an object that is the strong-typed metadata for the initial code.



OK, but how does that work?



For metaprogramming:



  • I create a minimal description is in a file with a .cfcs extension
  • I lie to Visual Studio and tell it that this is a C# file (Tools/Options/Text Editor/File Extensions) I get nice IntelliSense for most features (more work to be done later).
  • MSBuild doesn’t see it as a C# file, so the .cfcs files are ignored as source in compilation
  • Generation creates .g.cs files that are included in compilation


The intent is to have this automated as part of your normal development pipeline, through one or more mechanism – build, custom tools, VS extension/PowerShell. The pipeline part is not done yet, but you can grab the necessary pieces from the console application in the example.



Getting CodeFirstMetadata



You can get this project on GitHub. I’ll add this to NuGet when the samples are in a more accessible from your Visual Studio project.

RoslynDom and Friends – Just the Facts

See this post for the Roadmap of these projects

RoslynDom

A wrapper for the .NET Compiler Platform – the roadmap has further plans

Project on GitHub

See the RoslynDomExampleTests project in the solution for the 20 things you’re most likely to do

Download via Visual Studio NuGet Package Manager if you want to play with that

RoslynDom-Provider

By Jim Christopher

A PowerShell provider for Roslyn Dom

Project on GitHub

CodeFirstMetadata

Strong-typed metadata from code-first (general sense, not Entity Framework sense)

Project on GitHub

See the ConsoleRunT4Example project in the solution along with strong-typed files and T4 usage

Roadmap for RoslynDom, CodeFirst Strong-typed Metadata and ExpansionFirst Templates

I’ve been working on three interleaved projects RoslynDom, CodeFirst Strong-typed Metadata and ExpansionFirst Templates. Also, Jim Christopher (aka beefarino) built a PowerShell provider. This post is an overview of these projects and a roadmap of how they relate to each other.


You can find the short version here.


clip_image002[7]


In the roadmap, blue indicates full (almost) test coverage and that the library has had more than one user, orange indicates preliminary released code, and grey indicates code that it’s really not ready to go and not yet available.


I’m working left to right, waiting to complete some features of the RoslynDom library until I have the full set of projects available in preliminary form.


RoslynDom Library


.NET Compiler Services, or Roslyn, does exactly what it was intended to do, which is exactly what we want it to do. It’s a very good compiler, now released as open source, and exposing all of its internals. It’s great that we get access to the internal trees, but it’s not happy code for you and I to use – it’s compiler internals.


At the same time, these trees hold a wealth of information we want – it’s more complete information than reflection, holds design information like comments and XML documentation, and it’s available even when the source code doesn’t compile.


When you and I ask questions about our code, we ask simple things – what are the classes in this file? We don’t care about whitespace, or precisely how we defined namespaces. In fact, most of the time, we don’t even care about namespaces at all. And we certainly don’t care whether a piece of information is available in the syntactic or semantic tree or whether attributes were defined with this style or that style.


RoslynDom wraps the Roslyn compiler trees and exposes the information in a programmer friendly way. Goals include


  • Easy access to the tree in the way(s) programmers think about code as a hierarchy
  • Easy access to common information about the code as parameters
  • Access to the applicable SyntaxNode when you need it
  • Access to the applicable Symbol when you need it
  • Planned: Access to the full logical model – solution to smallest code detail
    (Currently, file down to member)
  • Planned: A kludged public annotation/design time attribute system until we get a real one
    (Currently, attribute support only)
  • Planned: Ability to morph and output changes
    (Currently, readonly)

Getting RoslynDom


You can get the source code on GitHub, and there’s a RoslynDomExampleTests project which shows how to do about 20 common things.


The project is also available via NuGet. It’s preliminary, use cautiously. Download with the Visual Studio NuGet package manager.


RoslynDom-Provider


Jim Christopher created a PowerShell provider for RoslynDom. PowerShell providers allow you to access the underlying tree of information in the same way you access the file system. IOW, you can mount your source code as though it was a drive.


I’m really happy about the RoslynDom-Provider. It shows one way to use a .NET Compiler Platform/library to access the information that’s otherwise locked into the compiler trees. It’s also another way for you to find out about the amazing power of PowerShell providers. If you’re new to PowerShell, and you’re a Pluralsight subscriber, check out “Discovering PowerShell with Mark Minasi”. It uses Active Directory as the underlying problem and a few parts may be slow for a developer, but it will give you the gist of it. Follow up with Jim Christopher’s “Everyday PowerShell for Developers” and “PowerShell Gotchas.” If you’d rather read, there are a boatload of awesome books including PowerShell Deep Dives and Windows PowerShell for Developers, and too many Internet sites for me to keep straight.


Getting RoslynDomProvider


This project is available on GitHub.


Code-first Strong-typed Metadata


You can find out more about strong-typed metadata here and code-first strong-typed metadata here.


As a first step, I have samples in runtime T4. These run from the command line at present. These templates inherit from a generic base class that has a property named Meta. This property is typed to the underlying strong-typed metadata item – in the samples either CodeFirstSemanticLog or CodeFirstClass. The EventSource template and problem is significantly more complex, but avoids some extra mind twisting with a strong-typed metadata class around a class. These templates are preliminary and do not handle all scenarios.



Metaprogramming


While there are a couple of ways to solve a metaprogramming expansion or code first problem, I’ve settled on an alternate file extension. The code-first minimal description is in a file with a .cfcs extension. Because I lie to Visual Studio and tell it that this is a C# file (Tools/Options/Text Editor/File Extensions) I get nice IntelliSense for most features (more work to be done later). But because MSBuild doesn’t see it as a C# file, the .cfcs file is ignored as a source file in compilation.


Generation produces an actual source code file in a file with a .g.cs extension. This file becomes part of your project. This is the “real” code and you debug in this “real” code because it’s all the compiler and debugger know about. As a result


- You write is the minimal code that only you can write


- You understand your application through either the minimal or expanded code


- You easily recognize expanded code via a .g.cs extension


- You can place the minimal and expanded code side by side to understand the expansion


- You debug in real code


- You protect the generated code by allowing only the build server to check in these files


Again this happens because there are two clearly differentiated files in your project – the .cfcs file and the .g.cs file.


The intent is to have this automated as part of your normal development pipeline, through one or more mechanism – build, custom tools, VS extension/PowerShell. The pipeline part is not done yet, but you can grab the necessary pieces from the console application in the example.


You can also find more here.



Getting CodeFirstMetadata


You can get this project on GitHub.


I’ll add this to NuGet when the samples are in a more accessible from your Visual Studio project.


ExpansionFirst Templates


T4 has brought us a very long way. It, and CodeSmith have had the lion’s share of code generation templating in the .NET world for about a decade. I have enormous respect for people like Gareth Jones who wrote it and kept it alive and Oleg Sych who taught so many people to use it. But i think it’s time to move on. Look for more upcoming on this – my current bits are so preliminary that I’ll wait to post.


Summary


I look forward to sharing the unfinished pieces of this roadmap in the coming weeks and months.


I’d like to offer a special thanks to the folks in my April DevIntersection workshop. The challenges of explaining the .NET Compiler Platform/Roslyn pieces to you let me to take a step back and isolate those pieces from the rest of the work. While this put me way behind schedule, in the end I think it’s valuable both in simplifying the metaprogramming steps and in offering a wrapper for the .NET Compiler Platform/Roslyn.

Code-first Metadata

This is “code first” in the general sense, not the specific sense of Entity Framework. This has nothing to do with Entity Framework at all, except that team showed us how valuable simple access to code-like metadata is.


Code first is a powerful mechanism for expressing your metadata because code is the most concise way to express many things. There’s 60 years of evolution to todays’ computer languages being efficient in expressing explicit concepts based on a natural contextualization. You can’t get this in JSON, XML or other richer and less-opinionated formats.


Code first is just one approach to getting strong-typed metadata. The keys to the kingdom, the keys to your code, lie in expressing the underlying problems of your code in a strong-typed manner, which you can read about here.


The problem is that the description of the problem is wrapped up with an enormous amount of ceremony about how to do what we’re trying to do. Let’s look at this in relation to metaprogramming where the goal is generally to reduce ceremony and


Only write the code that only you can write



In other words, don’t write any code that isn’t part of the minimum definition of the problem, divorced of all technology artifacts.


For example, you can create a SemanticLog definition that you can later output as an EventSource class, or any other kind of log output – even in a different language or on a different platform.


To do this, describe the SemanticLog in the simplest way possible, devoid of technology artifacts.


namespace ConsoleRunT4Example
{
[SemanticLog()]
public class Normal
{
public void Message(string Message) { }
[Event(2)]
public void AccessByPrimaryKey(int PrimaryKey) { }
}

}

Instead of the EventSource version:


using System;
using System.Diagnostics.Tracing;

namespace ConsoleRunT4Example
{
[EventSource(Name = "ConsoleRunT4Example-Normal")]
public sealed partial class Normal : EventSource
{
#region Standard class stuff
// Private constructor blocks direct instantiation of class
private Normal() { }

// Readonly access to cached, lazily created singleton instance
private static readonly Lazy<Normal> _lazyLog =
new Lazy<Normal>(() => new Normal());
public static Normal Log
{
get { return _lazyLog.Value; }
}
// Readonly access to private cached, lazily created singleton inner class instance
private static readonly Lazy<Normal> _lazyInnerlog =
new Lazy<Normal>(() => new Normal());
private static Normal innerLog
{
get { return _lazyInnerlog.Value; }
}
#endregion


#region Your trace event methods

[Event(1)]
public void Message(System.String Message)
{
if (IsEnabled()) WriteEvent(1, Message);
}
[Event(2)]
public void AccessByPrimaryKey(System.Int32 PrimaryKey)
{
if (IsEnabled()) WriteEvent(2, PrimaryKey);
}
#endregion
}
}

Writing less code (10 lines instead of 47) because we are lazy is a noble goal. But the broader benefit here is that the first requires very little effort to understand and very little trust about whether the pattern is followed. The second requires much more effort to read the code and ensure that everything in the class is doing what’s expected. The meaning of the code requires that you know what an EventSource is.


Code-first allows you to just write the code that only you can write, and leave it to the system to create the rest of the code based on your minimal definition.

Strong-typed Metadata

Your code is code and your code is data.

Metaprogramming opens up worlds where you care very much that your code is data. Editor enhancements open up worlds where you care very much that your code is data. Visualizations open up worlds where you care very much that your code is data. And I think that’s only the beginning.

There’s nothing really new about thinking of code as data. Your compiler does it, metaprogramming techniques do it, and delegates and functional programming do it.

So, let’s make your code data. Living breathing strongly-typed data. Strong typing means describing the code in terms of the underlying problem and providing this view as a first class citizen rather than a passing convenience.

Describing the Underlying Problem

I’ll use logging as an example, because the simpler problem of PropertyChanged just happens to have an underlying problem of classes and properties, making it nearly impossible to think about with appropriate abstractions. Class/property/method is only interesting if the underlying problem is about classes, properties and methods.

The logging problem is not class/method – it’s log/log event. When you strongly type the metadata to classes that describe the problem being solved you can reason about code in a much more effective manner. Alternate examples would be classes that express a service, a UI, a stream or an input device like a machine.

I use EventSource for logging, but my metadata describes the problem in a more generalized way – it describes it as a SemanticLog. A SemanticLog looks like a class, and once you create metadata from it, you can create any logging system you want.

Your application has a handful of conceptual groups like this. Each conceptual group has a finite appropriate types of customization. Your application problem also has a small number of truly unique classes.

Treating Metadata as a First Class Citizen

In the past, metadata has been a messy affair. The actual metadata description of the underlying patterns of your application have been sufficiently difficult to extract that you’ve had no reason to care,. Thus, tools like the compiler that treated your code as data simply created the data view it needed and tossed in out as rubbish when it was done.

The .NET Compiler Platform, Roslyn, stops throwing away its data view. It exposes it for us to play with.

Usage Examples

I’m interested in strongly typed metadata to write templates for metaprogramming. I want these template to be independent of how you are running them – whether they are part of code generation, metaprogramming, a code refactoring or whatever. I also want these templates to be independent of how the metadata is loaded.

Strongly typed metadata works today in T4 templates. My CodeFirstMetadata project has examples.

I’m starting work on expansion first templates and there are many other ways to use strong-typed metadata – both for other metaprogramming techniques and completely different uses. One of the reasons I’m so excited about this project is to see what interesting things people do, once their code is in a strong-typed form. At the very least, I think it will be an approach to visualizations and ensuring your code follows expected patterns. It will be better at ensuring large scale patterns than code analysis rules. Whew! So much fun work to do!!!

Strong-typed Metadata in a T4 Template

Here’s a sample of strong typing in a T4 template

 
image
 



There’s some gunk at the top to add some assemblies and some using statements for the template itself. The important piece at the top is that the class created by this template is a generic type with a type argument – CodeFirstSemanticLog – that is a strong-typed metadata class. Thus the Meta property of the CodeFirstT4CSharpBase class is a SemanticLog class and understands concepts specific to the SemanticLog, like IncludesInterface. I’ve removed a few variable declarations that are specific to the included T4 files.

Performance in Tracing

 

I talk about tracing in other posts, including here and here. I talk about semantic tracing here. I also have a Pluralsight video on tracing with ETW and EventSource here.

Bill Chiles has a great article here. His article is based on the Roslyn team’s experience. The Roslyn compilers needed massive tuning so that the managed compilers would have similar performance to the older unmanaged compilers. His article is great on overall application performance, and I use it here in relation to tracing.

Bill lists four issues, and then a few miscellaneous items:

  • Don’t prematurely optimize – be productive and tune when you spot problems.
  • Profiles don’t lie – you’re guessing if you’re not measuring.
  • Good tools make all the difference – download PerfView and look at the tutorials.
  • Allocations are king for app responsiveness – this is where the new compilers’ perf team spent most of their time.

When tracing adversely impacts performance, it’s often due to I/O or I/O binding. I/O issues occur when you are writing to trace storage on the main thread – on Windows, you can avoid this by using ETW. In .NET, you use ETW by using EventSource. The rest of this article covers adverse performance impacts of tracing unrelated to I/O issues.

Premature optimization and profiling

Outside tracing, I almost always agree with the recommendation that premature optimization wastes programmer time and results in unnecessarily complex applications.

But tracing is different. Optimizing tracing is not premature and considering performance throughout your discussions of trace strategies makes sense. If your application is well traced, you have a lot of tracing calls. You never want a programmer to consider whether adding a trace call will hurt performance.

One of the goals of tracing is to discover problems as quickly as possible – so you don’t want to implement a strategy you’re afraid to leave on in production.

Tracing is part of your profiling strategy and tracing that perceptibly slows your application in production may lead you astray in evaluating profile results.

Profiles don’t lie. Knowing the performance metrics of your application with tracing turned off and with various sets of traces turned on via level (error, warning, information), keywords, etc. is important. Unless I/O is slowing down your traces, you’ll find that tracing is a very, very small percentage of your application’s effort. If it’s a significant portion, your tracing strategy is flawed and you need to fix it.

Because the impact is so small, it’s unlikely that you can use profiling to improve performance of your tracing.

Assuming a fast trace infrastructure (ETW with EventSource or out-of-proc SLAB in .NET) the goal of improving trace performance is not to improve overall application performance. The goal is to provide confidence that you can turn tracing on whenever you want, or ideally leave a set of traces on at all times in a “flight recorder” mode. “Flight recorder” mode means traces are recorded in a rolling fashion that’s made permanent when an interesting event happens.

You always want tracing to be as fast as possible, as long as it’s not causing undo complexity or extra effort.

Happily, with ETW and SLAB using ETW you can have high performance tracing. And happier still, performance considerations will improve your tracing design by pushing you more firmly into the semantic tracing camp.

Performance considerations for tracing

Performance considerations make for better tracing. Here are a few examples:

  • Isolate all trace calls behind semi-permanent signatures with no artifacts of the underlying technology
    • Allows evolution in response to technology improvements (such as implementing an out-of-proc strategy
    • Also helps discoverability
  • Make these methods very granular and very specific to (and descriptive of) the action you are tracing
    • Allows more granular enabling of trace events
    • Less code to run within individual trace methods
    • Also documents application actions
    • Also provides consistency in details like level, keywords and channel
    • Also simplifies usage (IntelliSense)
  • Use strongly typed parameters for these methods
    • Avoids boxing (heap allocations and resulting GC load)
    • Also simplifies usage (IntelliSense) and documents actions
  • Avoid large structs as parameters
    • Avoids copying on the stack
    • Also simplifies usage (IntelliSense) and documents actions
  • Avoid concatenating strings and string.Format() as well as Concat(), Split(), Join() and Substring()
    • Avoids allocations
    • Also results in a trace that can be interpreted without parsing
  • Avoid retrieving data or any expensive operations
    • Obviously, avoids spending that time
    • Also ensures trace is just recording current conditions
  • Get to know the information provided by the trace system for free
    • Avoids tracing things that are already available
    • Also allows for fewer trace calls and a simpler system
  • Consider having tracing on during testing and not using DI (Dependency Injection) for tracing
    • Avoids running DI code, and possibly allocations
    • Also simplifies your application
    • Also allows programmers to use tracing during initial debugging
    • Also gets programmers in the habit of using traces

Semantic tracing is the style of tracing that easily fulfills these goals. Semantic tracing is a style, it can be used on any platform in any language.

Semantic tracing fills many of these performance focused goals

Semantic tracing is by its nature strongly typed. This has enormous benefits in clarity of purpose and IntelliSense support. It can also have a significant impact on performance because value types are not boxed. And, by isolating your tracing code, additional optimization can be done later, and only if needed. For example, you may need to do extra work to avoid the object overload of EventSource.WriteEvent(). But this code is a pain and adding it in all cases would be a premature optimization.

If you are using .NET and you are not yet using EventSource and ETW, no other performance improvement will be as great as moving to ETW. You can use ETW when you use EventSource directly without alternate listeners, and when you use out-of-proc SLAB (Semantic Logging Application Block from Microsoft Patterns and Practices which is an enhancement of EventSource). Isolating your calls allows you to make the change to ETW at your convenience.

I talked about semantic tracing here. The rest of this article explores details in Bill Chile’s article from a semantic perspective and assumes you are tracing outside the current process (EventSource or out-of-proc SLAB in NET).

Avoiding boxing allocations with semantic tracing

Heap allocations themselves have a tiny impact. The bigger problem is that each allocation must eventually be garbage collected. Garbage collection is also relatively fast, but it occurs in spurts. In most normal application code, allocations occur at a rate that can be smoothly garbage collected and you don’t need to worry about this issue, unless profiling shows that you have excessive allocations or garbage collection.

Since a well-traced application has lots and lots of trace calls, any allocations you have will occur a large, possibly massive, number of times. Avoiding unnecessary allocations in tracing is a very good thing.

Bill shows a great example of the allocation problems tracing can cause with unnecessary boxing:

public class BoxingExample
{
public void Log(int id, int size)
{
var s = string.Format("{0}:{1}", id, size);
Logger.WriteLine(s);
}
}



I see five heap allocations in this code. The string s is allocated. The two integers are boxed to object to pass to the Format() method. A string is then created for each to perform the concatenation.



These allocations occur whether or not tracing is turned on.



Replace this code with the following:



public class SemanticLoggingExample() : EventSource
{
public void ProcessStarted(int ProcId, int SampleSize)
{
WriteEvent(ProcId, SampleSize);
}
}



This code has zero allocations.



It is also more readable, discoverable, IntelliSense friendly.



Boxing and logging with strings



Semantic tracing discourages the use of strings in tracing. If you’re trace technology requires them, you can create the strings within your semantic trace method. The advantage is that you can avoid the resulting allocations when tracing is turned off, and you can replace your technology with one that does not require string creation when it’s available.



EventSource() in .NET does not require any string creation. You might want a message reported to consumers of your trace, but you can do this with the Message property of the Event attribute (the message parameter to the constructor). This is similar to a format string and is included in the manifest for the events. The ETW consumer application can build the string for the human user; the common consumers already use this string to display the message to the user. Your application does not ever build this string.



This saves I/O load, CPU cycles, boxing, and GC load.



As a side bonus, the Message property of the Event attribute can be localized. See my course or the ETW specification.



Bill’s advice on using ToString() prior to calling String.Format() is good when you are using the String.Format() method in .NET. But if you are building strings as arguments in calls to your trace system, you are almost certainly doing something wrong. Instead, maintain value types as value types throughout as much of the tracing pipeline as possible, and always to the point you can check whether tracing is turned on. And then, use the ToString() trick . It’s a cheap improvement since extra ceremony at that point is not distracting – the point of the method is to create the trace.



Avoiding other allocations



Bill’s article is a great source of other ways to avoid allocations and other performance tips. These issues are rare in tracing, but worth considering when they occur:



  • GetHashCode: cast enums to the underlying type
    • You might use a hash code as part of a strategy to hide sensitive information; hashing an enum might rarely be part of this, and it’s easy enough to do the cast in that scenario
  • HasFlag: boxes – use bitwise comparison in frequent calls
    • HasFlag with tracing is almost certainly in a section of a semantic method that doesn’t run unless tracing is turned on, and at that location, the extra clarity of HasFlag is probably not that important
  • String operations – watch Format(), Concat(), Split(), Join() and Substring() in frequent calls
    • Avoid strings in tracing as much as possible, and at least ensure ensure it’s in a section of a semantic method that doesn’t run unless tracing is turned on
  • Unnecessary literal allocations – see Bill’s WriteFormattedDocComment sample
    • If you have literals – like a table name – create it once
  • StringBuilder – still requires an allocation
    • Try to avoid creating any string that’s complex enough to make string builder appropriate
    • If you think you need a string builder in tracing, at least ensure it’s in a section of a semantic method that doesn’t run unless tracing is turned on.
  • Clever caching – Bill’s example is caching a StringBuilder
    • Avoid creating new object instances in your tracing
  • Closures in lambdas – a class is generated on compile and an instance allocated
    • Avoid lambda closures in tracing
  • LINQ – in addition to common lambda closures, there are extra allocations for delegates
    • Avoid LINQ in tracing
  • Inefficient async caching – cache the task, not just the result
    • Use tracing that is fast and let the technology (ETW) get off the thread – avoid async in tracing. You can’t build an async strategy that is as fast as the one Microsoft built with ETW (assuming .NET)
  • Dictionaries – often mis-used when simpler structures would work
    • You might use a lookup to sensitive information in a few cases (keep sensitive information out of traces, regardless of technology) and use a dictionary only if there will be many lookups and the list isn’t small
  • Class vs struct – classic space/time tradeoff
    • Consider this in tracing calls, generally pass specific data items rather than large structs, although this can undermine the ideal of semantic tracing
  • Caching without a disposal plan (like a capacity limit) – avoid this because it’s also called a memory leak
    • This can happen if you create a lookup for sensitive information, create a cache plan


You can read Bill’s article for more information on each of these issues.



Boxing and the EventSource.WriteEvent() object array overload



Boxing and an allocation occur whenever you pass a value type as an object parameter, so anytime you avoid calling an object overload it’s a good thing. But like all other methods, EventSource.WriteEvent() method has a finite number of type specific overloads. The problem is, creating additional overloads for WriteEvent() is a rather ugly operation requiring an unsafe code block.



You can determine if you’re using the object parameter overload through IntelliSense or Go To Definition.



I think you can avoid crating this extra overloads unless you either know you’re hammering a particular overload pattern with a very large number of calls, or you know from profiling that you have an allocation or GC problem. In that case, you should absolutely create the alternate overload to avoid the object parameter array version, to avoid boxing the value types in the call to a semantic trace.



In my Pluralsight video, I discuss that creating your own overload has about a 2 fold performance impact. That was a casual test that just measures the boxing in the worst case (a very simple call) and probably didn’t run into generation 2 GC blocking. I’m not confident that I captured the full impact, but it is small for traces that happen less than hundreds (possibly thousands) of times per second.



See the ETW specification for information on building extra overloads.



Tracing on, tracing off



Relying on checks of whether tracing is turned on develops a sense that you don’t care about trace performance when tracing is turned on. I’ve even heard developers state that position. But there is no value to tracing until you turn it on!



If you’re using a slow trace technology that touches a resource on your main application thread (in .NET that’s every trace strategy I know of other than ETW and out-of-proc SLAB using ETW) every optimization in this article is trivial compared to the cost of touching the resource when tracing is turned on. Use strongly typed parameters to semantic trace methods and guard your semantic trace methods with a check of whether tracing is turned on, then ignore all the other optimizations until you can switch to a more efficient form of tracing.



If you’re using EventSource, the first thing the WriteEvent() method does is check IsEnabled(). So using IsEnabled() has a trivial advantage if you are just calling WriteEvent() from your semantic trace method. If you use the IsEabled() method and tracing is turned off, you avoid an extra method call. If you use the IsEnabled() method, and tracing is turned on, there are two method calls. Method calls are very fast, the difference is trivial.



Use IsEnabled() when:



  • You’re doing extra work before calling WriteEvent()
  • You’re passing a non-trivial structure to WriteEvent()
  • You’re using the object overload of WriteEvent()


Summary



If you’re not using semantic tracing with strongly typed parameters, move to it to improve your overall trace strategy and performance.



If you’re not using a high performance out-of-proc tracing infrastructure, improve trace performance by moving to one.



If you’re using a high performance semantic trace strategy, several further tweaks are simple to do. These improvements are especially important if they increase confidence in your ability to using tracing during production.

Semantic Tracing

 

I talk about semantic tracing in my Pluralsight course Event Tracing for Windows (ETW) in .NET. Here’s a summary.

The Semantic Logging Application Block (SLAB) documentation discusses semantic tracing here. This is great documentation, but obscures the fact that semantic tracing can be done on any platform in any language. It’s just a style of tracing.

You create a class, or a very small number of classes, dedicated to tracing (pseudo-code)

public class Logger
{
    public static Logger Logger = new Logger();
    public void AccessByPrimaryKey(int PrimaryKey, string TableName)
    public void CalculateGrowthRateStart(int InitialPopulation)
    public void CalculateGrowthRateEnd()
    // Numerous methods
}


.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }



Now tracing just involves knowing the logger class name and typing:



Logger.Log.CalculateGrowthRateStart(42);


.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

With IntelliSense at every step this is about a dozen keystrokes. The only specific knowledge you need is the name of the Logger class – and with Visual Studio IntelliSense, you’ll get that easily if you include “Logger” anywhere in the name. Just imagine how simple that is!



Benefits



Discoverability and easy access are just two of the benefits of this style of logging.



The call to the trace method states what is happening and helps to document the calling source code. This is especially true if you use named parameters.



The resulting trace output is strongly typed and easily filtered and sorted on specific named parameter values.



Strongly typed parameters improve performance by avoiding string creation and avoiding boxing on any platform that has cheap stacks and expensive heap memory usage (such as .NET).



All access to the underlying tracing technology is isolated. Stuck with TraceSource now, but intimidated by the potential of future performance issues? Move to semantic tracing and the future change to USE EventSource and ETW will involve a single file. Any change that doesn’t involve the signature, even a technology change, is isolated to a single class.



Another implication of isolated tracing is that details can be added and removed as they are available via the underlying technology – things like thread and user.



Details about the trace are also consistent. The calling code is unconcerned about issues like level (error, warning, information), channel, and keywords. These details can be changed at any point in the application life cycle.



The result is a trace strategy that is clear and consistent.



ETW, EventSource and SLAB



Both EventSource and SLAB encourage semantic tracing, assuming you avoid creating generic trace methods like “Log” and “Warning.” Semantic logging involves giving a nice name to the method, along with strongly typed parameters.



You might want vague methods to keep you going one afternoon when you encounter something new you want to trace, but use them as a temporary measure.



Whether a particular event is a warning, information or an error is in the nature of the action, not the nature of this particular occurrence of the action – or it’s not very well defined, and it’s not semantic.



Summary



One of the most important tools in improving your trace performance is semantic tracing. Simply put, it means that all trace calls are to a common class, or small set of classes, that have methods indicating each specific thing you will trace along with strongly typed parameters. The trace method and each trace parameter has a very clear name allowing IntelliSense to guide correct calls.

The Case of the Terrible, Awful Keyword

In the next version C# there will be a feature with a name/keyword you will probably hate.

The thread on CodePlex is even named “private protected is an abomination.”

This is the story of that name and what you can do to help get the best possible name.

The feature and why we don’t already have it

C# has a feature called protected internal. Protected internal means that the member is available to any code in the same assembly (internal) and is also available to code in any derived class (protected). In the MSIL (Intermediate Language), this is displayed as famorassem (family or assembly).

MSIL also supports famandassem (family and assembly) which allows access only from code that is in a derived class that is also in the current assembly.

Previously, every time the team has considered adding this feature in C#, they’ve decided against it because no one could think of a good name.

For the next version of C#, the team decided to implement this feature, regardless of whether they could come up with a suitable name. The initial proposal by the team was “private protected.” Everyone seems to hate that name.

The process

One of the great things about this point in language design is that the process is open. It continues to be open to insider’s like MVPs a bit earlier – which reduces chaos in the public – but the conversation is public while there’s still room for change..

In this case, the team decided on a name (private protected) and the outcry caused the issue to be reopened. That was great, because it allowed a lot of discussion. It seems clear that there is no obvious choice.

So the team took all the suggestions and made a survey. Lucian was conservative with the possible joke keywords – if it was possible that someone intended it seriously, it’s in the survey.

How you can help

Go take the survey! You get five votes, so it’s OK to not be a bit uncertain.

If you hate them all, which one annoys you least?

Do you think we need a variation of the IL name familyorassembly?

Do you think we need to include the names internal and/or protected?

Will people confuse and English usage and bit operation?

Will people confuse whether the specified scope is the access or restriction (inclusion or exclusion)?

Should the tradition of all lower case in C# be broken?

Do we need a new keyword?

Is there value in paralleling VB?

Note: In the VB language design meeting on this topic (VM LDM 2014-03-12), we chose to add two new keywords "ProtectedAndFriend" and "ProtectedOrFriend", as exact transliterations of the CLI keywords. This is easier in VB than in C# because of VB’s tradition of compound keywords and capitalizations, e.g. MustInherit, MustOverride. Lucian Wischik [[ If C# parallels, obviously Friend -> internal ]]

I don’t think there’s a promise that the elected name will be the one chosen, but the top chosen names will weigh heavily in the decision.

Go vote, and along the way, some of the suggestions are likely to bring a smile to your face.

Should the feature even be included

There are two arguments against doing the feature. On this, I’ll give my opinion.

If you can’t name a thing, you don’t understand it. Understand it before including it.

This was a good argument the first time the feature came up. Maybe even the second or third or fourth or fifth. But it’s been nearly fifteen years. It’s a good feature and we aren’t going to find a name that no one hates. Just include it with whatever name.

Restricting the use of protected to the current assembly breaks basic OOP principles

OK, my first response is “huh?”

One of the core tenets of OOP is encapsulation. This generally means making a specific class a black box. There’s always been a balance between encapsulation and inheritance – inheritance breaks through the encapsulation on one boundary (API) while public use breaks through it on another.

Inheritance is a tool for reusing code. This requires refactoring code into different classes in the hierarchy and these classes must communicate internal details to each other. Within the assembly boundary, inheritance is a tool for reuse – to be altered whenever it’s convenient for the programmer.

The set of protected methods that are visible outside the assembly is a public API for the hierarchy. This exposed API cannot be changed.

The new scope – allowing something to be seen only by derived members within the same assembly – allows better use of this first style of sharing. To do this without the new scope requires making members internal; internal is more restrictive than protected. But marking members internal gives the false impression that it’s OK for other classes in the assembly to use them.

Far from breaking OOP, the new scope allows encapsulation of the public inheritance API away from the internal mechanics of code reuse convenience. It can be both clear to programmers and enforced that one set of members is present for programming convenience and another set for extension of class behavior.

Just another Microsoft MVPs site