Incorporating TraceSource and Other Techniques with EventSource

I got a great question on ETW/EventSource/Semantic Tracing from someone watching my Pluralsight course on ETW and EventSource:

I am going through your ETW training on Pluralsight. I like the content. Want to thank you for it.

I want to embrace semantic logging and go all out for it. And I have a question:

  • If I am developing on .NET 4.5/4.5.1, should I use just EventSource class or do you see any need for using TraceSource as well? This question emanates from the fact that there are two kinds of logging – one is exception logging – which I want to do no matter what, irrespective of whether or not the ETW provider has been enabled or not. Second kind is tracing log, which I may want to do only when the provider is enabled.

So in the exception logging case above, should I use EventSource with SeverityLevel of “LogAlways” or should I use TraceSource for this (as we did earlier) and use ETW for trace logging only.

It will be great to get your thoughts on this.

I want to answer in two parts. First, the great, fantasmicalogical thing about Semantic Tracing is that you make this decision in an isolated location. Since a semantic trace method is just a method that wraps whatever tracing you’re doing, it can be as flexible as you want it to be. Call this single, simple, IntelliSense supported method from the body of your application, pass the interesting information as parameters, and then do whatever you want in the body of that method. If you like, you can call EventSource and then also log in a different way, even TraceSource.

In some cases, the signature of the Semantic Trace method and the signature of the WriteEvent method of the EventSource class are different. In this case, wrap a private WriteEvent method with a public method and put the NonEvent attribute on the public method

Semantic Tracing is more than just friendly to parallel use of other trace technology. The more complicated your needs, and the more older technology you’re using, the more important it is to hide it behind the Semantic Tracing method façade so you can adjust later.

So, the next question – what is TraceSource offering that you don’t otherwise have with EventSource and ETW? You’ve identified a couple of aspects. You can control, at application startup via configuration, that certain trace events will always be recorded. With ETW and EventSource, if no one is listening, no one is listening, and even trace events marked “LogAlways” will be dropped on the floor.

Also, exceptions don’t happen very often. Since it’s rare and terribly important, a performance hit to record the exception is acceptable. I think the deeper issue is how you control the pipeline – and who controls the pipeline – what happens after the trace is recorded.

One approach you can take is to use the NuGet version, write to the admin channel, ensure that the provider is properly registered on the target machine and use reporting features of Event Viewer to route the exception to the right people. I think it will be very common for trace events written to the Admin channel to also be written to the Debug channel from the same semantic trace method. Just use a public NonEvent wrapper method that calls two private event methods, along with variations in the data appropriate to each channel. While performance won’t be hurt if you send a lot of messages to the admin channel, it will teach your IT staff to ignore what you put in Event Viewer, so save this for rare, high importance messages.

Another approach is to use TraceSource, and if other applications in your organization are reporting through a particular listener, this can be a great idea. Similarly Log4Net can still play a role. Although these tools work in process so certain types of application crashes aren’t captured because the streams aren’t flushed correctly. Be careful about doing this for anything other than rare events of very high importance. Excess calls will create performance problems.

With both the admin channel and TraceSource/Log4Net approaches, I suggest that you create a semantic tracing method, and from that single method call you place the trace event information in multiple locations. The semantic trace method hard codes the calls you’re making.

There are two other approaches that allow simple “normal” EventSource methods to serve as your semantic trace methods. With these two approaches, your EventSource class is unchanged to accommodate the requirement to always log.

The Semantic Logging Application Block from Patterns (SLAB) works in two modes, in process and out of process. In process allows you to start SLAB from your configuration, as you do with TraceSource. However, it also retains the TraceSource problem of losing critical final events with certain types of crashes.

Out or process, you can start a service that SLAB provides to act as a controller. You can filter to the specific events you consider critical and route these events to traditional locations, like a log file. The real value here is routing to those traditional locations (console, log, XML, SQL). That’s particularly useful during development, but it can also be used across the life cycle. Out of process SLAB will record final events, as long as the service and its output locations remains functional. You will need to start the service at application startup. You can check out more in the SLAB documents (which are awesome).

A more complex possibility, possibly more appropriate on a server, is to ensure ETW listeners are started when your application is started. One way to do this is by starting your application with a script. I realize that PerfView is hard to use, but if you get the command line correct, you can do “flight recorder” tracing. You can set PerfView to maintain a rotating buffer of informational events that it writes out permanently only when something bad like an exception happens. How cool would it be to have the previous five minutes (arbitrary) of informational trace event when your application crashes?

So, there are four ways to provide special handling for exceptions to ensure they are always recorded, while staying within your commitment to do Semantic Tracing:

  • Write semantic trace methods that write to the debug and admin channel and use EventViewer
  • Write semantic trace methods that write to ETW and a legacy tool like TraceSource or Log4Net
  • Use the Semantic Logging Application Block to log to write certain EventSource events to an alternate stream in process or out of process
  • Start an ETW session automatically when your application starts up

At some future point, I hope .NET makes it extremely easy to automatically start an adjacent ETW session.

Thanks for asking such a great question!

Another Journalist Said Something Stupid

In this article Paul Krill quotes the following statement:

The language’s placement was surprising because many software engineers have refused to migrate to VB.Net, the successor to the "well-beloved, classic Visual Basic 6.0 version," Tiobe said. It also was unexpected because Microsoft seems to have slowed down development of VB.Net. "For example, the latest Visual Studio version 2013 doesn’t contain any new VB.net language features," Tiobe said.

Yes, this is a quote. The Tiobe article that Paul quotes makes this stupid statement, but InfoWorld could have done the smallest amount of fact checking to find out why this statement is stupid.

This statement stupid because Visual Studio 2013 was not a languages release and there were also no changes to C#. There were changes to C++, F# and JavaScript, but anyone watching Microsoft knows those three languages are on a different track than C# and VB.NET which are on a combined track.

Any journalist writing about Microsoft languages should have this much of the basics.

Microsoft has demonstrated in numerous ways that the speed of VB.NET development now parallels the speed of C# development. They’ve also stated many times that they are working on new compilers for both C# and VB.NET. I’ve talked here about my thoughts on the future of the languages, including implications of Mads Torgersen’s NDC talk.

And I’ll just skip my opinion on the Tiobe Index itself, and whether the change in VB.NET is real or just noise. Since this corresponded with a significant (by percent) drop in Visual Basic, which is tracked separately, well, the rabbit hole really goes nowhere…

What does Blazingly Fast Tracing Mean

I wrote a summary of ETW here and some basics here. I’ll be posting more to coincide with my new Pluralsight course on ETW tracing.

ETW tracing is blazingly fast. The operating system tosses the trace work like accessing files onto another thread. It does this at a very low level allowing very, very fast tracing when tracing is turned on. If tracing is turned off, the performance hit is even smaller. But why worry about tracing turned off!?!?! The goal is creating a trace system you can leave on in production!

OK, that’s a good sound bite, but while I think you should have some tracing that you leave on all the time, you might also want some high frequency event that you have on only some of the time, such as when you’re doing a particular kind of evaluation.

Here are some rough performance metrics I did a while ago – this is on my laptop with a fair amount of open applications.

 

 int/string

 

Object Overload

 

Event Off

9ns

100 million/sec

20ns

50 million/sec

Event Watched

410ns

2 million/sec

720ns

1 million/sec

 

Don’t get carried away with the details of these numbers, and do your own metrics if you feel the details are important. A nanosecond is one billionth of a second. Dedicating the CPU to tracing (a loop) allows about 2 million trace calls/sec with tracing turned on, 100 million with tracing turned off. That’s really, really fast.

But, Ok, you aren’t going to dedicate 100% of your CPU to tracing.

More or less, if you allow 1% of your system resources for tracing, with it turned on, you can do about twenty thousand trace events per second (20,000!). OK, there’s latency and throughput, but I’ll argue at 1% you just don’t care. Doing a super high throughput system with 10,000 transactions/second? You can probably still afford a few trace calls per request – yes, in production (benchmark that in your specific scenario).

By the way, the specification suggests that the hit of using the fallback overload that takes parameter array of objects might have an impact of 10-20 fold. I didn’t see that. The spec explains how to add more overloads – but it’s ugly, requires unsafe code and not something I think you initially need to worry about. It’s about double, and you can add required overloads when you need tracing at half a millionth of a second, instead of a millionth of a second.

The spec also suggests that you might see a performance increase using a locally cached copy of the logger, rather than repeated hits to a static property on the log class. I have not been able to measure any performance improvement, so this unnecessarily complicates your code (unless you are using DI, which I recommend for many, many things, but not tracing).

When you turn tracing off, the impact drops by nearly two orders of magnitude.

You can’t get anywhere close to this kind of performance out of any other trace technique I know of. Certainly not from Trace.Write, TraceSource, or Log4Net.

Did No One Count?

This is embarrassing, although I can explain, really officer. I wasn’t drinking, it just looked that way.

I put up a Five Levels of Code Generation and it contained these bullet points:

  • Altering code for the purpose of machines
    The path from human readable source code to machine language
  • DSL (Domain Specific Language)
    Changing human instructions of intent, generally into human readable source code
  • File Code Generation
    Creating human readable source code files from small metadata, or sometimes, altering those files
  • Application code generation or architecture expression
    Creating entire systems from a significant body of metadata

See the problem?

Now, if you read the post, you might see what I intended. You might realize that I was in the left turn lane, realized I needed something at the drugstore on the right, didn’t realize the rental car needed to have its lights turned on (mine doesn’t) on a completely empty road at midnight in a not-great neighborhood in Dallas. Really, officer, I haven’t been drinking, I’m just an idiot.

There’s five because I make a split in the first item: That was partially because that post was inspired by confusion regarding what RyuJIT means to the future of .NET. (It actually means, and only means, that your programs will run better/faster in some scenarios).

The code you write becomes something, and then it becomes native code. That “something” for us has been IL, but might be a different representation. One reason for the distinction is that there are entirely separate teams that think about different kinds of problems working on compilers and native code generation. IL is created by a compiler that specializes in parsing, symbol resolution and widely applicable optimization. Native code is created in a manner specific to the machine where it will be run. In the big picture, this has been true since .NET was released and it’s a great design.

I think language compilation and native code creation are two distinct steps. One is all about capturing the expressive code you write, and the other is all about making a device work based on its individual operating system APIs.

But I might be wrong. I might be wrong because the increasing diversity in our environments means implications of native code API’s on the libraries you use (PCL). I might be wrong because languages like JavaScript don’t use IL (although minification is not entirely different). I might be wrong because it’s only the perspective of the coder that matters, and the coder rarely cares. I might be wrong because I’m too enamored with the amazing things like multi-core background JIT and MPGO (you can see more in the Changes to the .NET Framework module of my What’s New in .NET 4.5 Pluralsight course).

The taxonomy of code generation will shape the upcoming discussions Roslyn will inspire about metaprogramming. Metaprogramming encompasses only the DSL/expansion, file, and architecture levels.

You might be rolling your eyes like the officer handing me back my license in Dallas. Yes, officer. You’re absolutely right. If I titled the post “Five Levels of Code Generation” I should have had FIVE bullet points.

Some ETW Basics

I wrote a summary of ETW here. I’ll be posting more to coincide with my new Pluralsight course on ETW tracing.

This is just a quick post so I can link back to these basics.

Tracing is what you include in your code so you know what’s happening during development and more importantly in production. Supporting players like .NET and the operating system also have trace output.

ETW stands for Event Tracing for Windows and is the tracing mechanism built into the Windows operating system.

ETW has controllers that turn tracing on and off, providers that create trace event entries, and consumers that let you view traces. Managing ETW tracing requires tools.

.NET provides good ETW support starting with .NET 4.5 with the EventSource class. This creates an ETW provider. You need external tools (not yet available in Visual Studio) to create and view traces.

ETW tracing with the EventSource class is fundamentally different than Trace and TraceSource. It is a different pipeline, configured with ETW tools, and can be turned on and/off without restarting your app.

ETW does the slow part of tracing on a separate thread. Trace, TraceSource, Log4Net and most other trace solutions do not do this, or not with the massive efficiency and blazingly fast performance of ETW.

Semantic tracing logically decouples from trace technology, isolates trace decisions like severity, is strongly typed (at the app), and states what is happening in the app, not details of what the trace should say.

Semantic tracing can be done with any trace technology, allowing you to switch later to ETW if you can’t do it now.

The Semantic Logging Application Block (SLAB from P&P) uses EventSource, avoids ETW tools during development, eases the transition to ETW, and has docs on using a semantic approach to tracing.

MEF and Startup Time

I got this question via email:


I am trying to determine why an application (I didn’t initially design it) that uses MEF and PRISM is slow (example: it takes more than 1 minute to login).  When I looked at the code I noticed the following:

1. Many of the classes that are decorated with the [Export] attribute and have the [ImportingConstructor] attribute on their constructors do significant work in their constructors (large queries that take several seconds to finish, calls to services, etc.).

2. MEF instantiate the above classes when the application starts.

When MEF instantiates these classes at start time, shouldn’t the constructors of these classes (i.e., the ones with the [ImportingConstructor] attribute) be as simple as possible? That is, avoid doing any lengthy operations so as to minimize the time it takes for MEF to finish instantiating all the classes that participate in composition?


Of course you don’t want your application doing all this work at startup and making your user wait! It’s great that you tracked the performance problem to something that is fixable.


There are a couple of parts to the answer.


Simplifying importing constructors


Importing constructors should be as simple as possible. That’s not just for performance. When things go bad in the creation of classes in MEF (or pretty much any DI tool), sorting out the problem can be a nightmare. Put more simply: do what you can to ease debugging object construction, and that means simplifying constructors.


In most cases, the easiest way is to create properties for the result of each query and have the query performed the first time it’s needed- a simple check against null followed by the query only when needed. Depending on the complexity of your application and the number of times this occurs, this is an easy or a rather tedious refactoring.


Delayed instantiation


The second point is that there is no inherent reason for MEF to be instantiating everything at startup. MEF instantiates classes when they are needed and something is requesting an instance of this class. These chains can become rather long and complicated when an instance of one class needs an instance of another, and another, and another. This is a general problem for DI containers.


I don’t know if this is happening inside PRISM or in code that you might have more control over, but this can happen in UI code where one screen or menu requires a reference to another, and another, and another.


It is possible that when you solve the problem with the importing constructors, you will still have a performance problem because of excessive linking – too many instances created. It will still be a design problem, and still not a MEF problem.


If this is deep in PRISM, I don’t know what to say, I don’t know PRISM.


If you encounter this in code you control, look at the Lazy class. Yep, there’s a Lazy class in .NET, I love that. While it’s slightly more difficult to use, it instantiates a simple lightweight object that has the capacity to create the actual instance at the time you first need it. This allows you to delay creation until you actually need it. Once the underlying instance is created, all requests for the value of the Lazy returns that first instance – no further instances are created.


MEF has other features for controlling lifetime, but they are more commonly used when you need additional instances in specific scenarios.


Other possible problems


A minute is a really long time in our world. More than a minute to login is a really, really long time.


It’s possible that you have additional issues. I suggest that you also check for pooling issues if you’re directly accessing a database (such as creating excess connections). Several seconds for a query against a local database is fairly long and you may also have indexing issues.


If the queries really are that long and when you are making calls to services, your users may still feel pain, even if it happens a different point in the application. You may need to shift to asynchronous calls for some of this work. I know async still feels like a high dive with no water in the pool, but it’s where we’re slowly going.


If this is a WPF application, or another application that loads a lot of assemblies (DLLs) and that has a predictable start-up sequence, explore multi-core JIT. If you have access to Pluralsight, you can find a discussion here in the clip titled “Multi-core background JIT.” It’s almost unknown, but you can get a significant boost in your assembly load time by adding a single line of code. If you don’t have other problems, this can improve overall startup times by about 20%.


It’s almost comical, but that one line of code is the name of a file. No matter how much we want this feature to just work, .NET can’t do something to the target machine without permission. You have to tell it where to put the information it uses to orchestrate JIT. That is, unless you’re using ASP.NET. Since ASP.NET is in the habit of putting files on its server, multi-core background JIT is automatically enabled for ASP.NET.


Your problem isn’t MEF, but…


I appreciate the question because you aren’t blaming MEF for other design problems. However, in our current server focused world, I want to point out that classic MEF is a little heavy. It is an amazingly powerful tool, but on a server with throughput issues and on a low power device it may not be the best tool.


Microsoft has also provided a different lightweight version of MEF. It’s had at least three names, and is currently on NuGet as MEF 2.


If you read my blog, you may have gathered between the lines that I think the core CLR teams are doing absolutely amazing, astounding things and that their communication with the community, their ability to toot their own horn, sucks. You didn’t know one line of code could improve your app load 20%, right? One of the ways that communication sucks is a lack of clarity on the future of MEF. It is inconceivable to me that as a tool as important as MEF could not have a long and bright future in both the rich and lightweight versions. But I wish the silence around MEF was not so loud.


I hope you’ll add to the comments what you tried that worked and didn’t work!

Reaction to an Editorial I Didn’t Like

I got a link a few days ago to this editorial by Mary Jo Foley on a private list. As I was writing my reply I decided to make it a blog post. Then I decided to cool down a few days. Then I decided to post it. It’s not profound, it’s just my editorial response to an editorial that pissed me off.

In the IT community we become so accustomed to hearing well thought out opinions from people that are astoundingly technically competent like Scott Hanselman, Mark Minasi or Richard Campbell (randomly picking an awesome dev, IT Pro and all-round devops guy) that we can fail to take background into account when reading a professional “industry watcher” journalist. When Mary Jo talks about the implications that Microsoft’s bottom line is primarily based on enterprise income such as she does here, please read. But when she talks about what devs should be worried about for the future, please be sure you are (at least also) reading things written by experts in development. And that your boss is (at least also) reading things written by experts in development.

The problem is that experts in development don’t want jobs writing scheduled editorials covering lightweight version of heavyweight topics. I know, because I recently turned down such an offer.

Things that are correct in this editorial

Microsoft will talk about its future strategy at Build – it is the declared purpose of the event. Microsoft will pay attention to what customers say after the event.

Devs were nervous after Build 2011. And every other day of every other year of every other decade (at least dev leads, managers and any devs paying attention).

The biggest thing this editorial got wrong

Microsoft has never been more transparent regarding actual products, and I’ve been watching a really long time.

Mary Jo says “The Windows transparency debate has waged since Microsoft got burned by sharing too much information too early about Longhorn/Vista.” With Longhorn, they showed a blurry vision of something they proved unable to build, and that in retrospect didn’t even make sense. Maybe it’s a debate, but it’s pretty clear that it’s a bad idea to blind people with glitter covered papier-mâché balloon fantasies. It was a bad idea for everyone, particularly, folks like me. Ouch.

But while the Windows division tightly controls when they raise the curtain, they are not raising the curtain with less transparency on real products. They continue to wait until they think they know what the product will look like, what their overall strategy will look like, and what their marketing look like. They continue to schedule Build events and invite journalists.

If Soma or ScottGu says “hey, we might make this” and later waves their wand around and says “naw, forget that, look over here, this other thing is so much way cooler,” devs get excited. If the Windows folks do this, the stock market and enterprise customers will not be happy.

As an aside: The Windows division has also been less transparent to their deeply NDA friends than the developer tools division. This is very bad for them because it slows their progress by not getting super-early feedback. Maybe they can learn from the success of the developer division in embracing the community.

The next biggest thing this editorial got wrong

As I talked about here, Microsoft is completing a critical revamp of what you know as the “.NET Framework.” This, along with the PCL allows the technical foundation for cross-platform code and isolation of code so that it can actually be sensible across platforms.

They have also quietly expanded the very technical area of creating native code from IL (I think we are now at seven strategies) in a way you haven’t needed to worry about. Did you even know that one of the things the Windows Store does for you is prepare your app for the particular phone it’s downloading to? Phones don’t like JIT strategies due to power consumption – you never had to think about that because the core teams already solved it for you.

Microsoft is also in the process of rewriting the C# and Visual Basic compilers, combining a visionary strategy with a complete commitment to existing customers. It will be an “open” compiler, but the commitment is that it won’t hurt you, in either backward compatibility or significant design time performance issues. They are making that commitment as they move from hand optimized C++ code to managed C# and VB code. Aren’t you impressed?

They’re also continuing work on the technologies that support core platforms – including important things like support for platform production diagnostics.

So it sounds bizarre to me for Mary Jo to say “Microsoft needs to insure its developer community that it isn’t going to leave them behind as it hones its platform strategy.” You can’t live on a fifteen year old technology. It’s never happened. They rebuilt the .NET platform strategy underneath you and you (probably) didn’t even notice. Apparently, Mary Jo didn’t. There are definitely problems outside the core in libraries. But Microsoft has made tremendous investments to ensure that .NET is ready for platform change, which is what she was talking about.

I am absolutely not here to pick nits, but if this is actually a typo for intending to say they need to “assure” you it isn’t going to leave you behind, then I agree. They are doing the work, but not talking enough about it.

Other things overlooked

Whether the eight dozen Windows Phone devices you and I are currently carrying continue to run Windows N+1 is not an interesting question, and it certainly isn’t a developer question. Can Microsoft create a phone, any phone! that sells enough to make developing for Windows Phone a compelling story? Do the eight dozen of us carrying Windows Phones actually care whether we have to buy a new phone?

XBox? The game market changed. Tis done. No one understands the new market. Outselling PlayStation and managing Steam competition is just the beginning. Should XBox be your new Roku, and if so should it block Amazon (no Alpha House?). Is home security via Kinect, Mazola based couples’ Twister, or Skyping with Grandma the killer app? Or maybe XBox should be a refrigerator/microwave/coffee make combo with an auto shopping feature. I’d like that.

By the way, you’re day job is probably in the Enterprise. Which means you care more whether Microsoft can make BYOD the gold standard.

David Sobeski’s article

It’s awesome that Mary Jo helped this interesting historical piece get more attention. But in a piece that was 2185 words long, the first technology of the last decade appears at word 1178. It’s an article about the 1990’s and 2000’s.

The same breaking change problems of the 1990’s and 2000’s have absolutely extended on the phone platform. Let’s be very clear here. The market for Windows Phone is not developers. The market for Windows Phone is consumers (including business consumers). I am truly sorry Microsoft has screwed phone devs with every version of the phone. I hope that’s over, but it’s not the most important issue. It’s not the most important issue until I see someone, anyone, at Starbucks pulling out a Windows Phone (outside the Seattle metropolitan area) and my son isn’t horrified at the thought of having one. The game is getting the retail hardware and the provider relationships right. It’s the only game.

There is a problem of trust

Of course there is a fracking issue of trust. But let me make that a different, and possibly much more interesting post.