Category Archives: 6193

The Case of the Terrible, Awful Keyword

In the next version C# there will be a feature with a name/keyword you will probably hate.

The thread on CodePlex is even named “private protected is an abomination.”

This is the story of that name and what you can do to help get the best possible name.

The feature and why we don’t already have it

C# has a feature called protected internal. Protected internal means that the member is available to any code in the same assembly (internal) and is also available to code in any derived class (protected). In the MSIL (Intermediate Language), this is displayed as famorassem (family or assembly).

MSIL also supports famandassem (family and assembly) which allows access only from code that is in a derived class that is also in the current assembly.

Previously, every time the team has considered adding this feature in C#, they’ve decided against it because no one could think of a good name.

For the next version of C#, the team decided to implement this feature, regardless of whether they could come up with a suitable name. The initial proposal by the team was “private protected.” Everyone seems to hate that name.

The process

One of the great things about this point in language design is that the process is open. It continues to be open to insider’s like MVPs a bit earlier – which reduces chaos in the public – but the conversation is public while there’s still room for change..

In this case, the team decided on a name (private protected) and the outcry caused the issue to be reopened. That was great, because it allowed a lot of discussion. It seems clear that there is no obvious choice.

So the team took all the suggestions and made a survey. Lucian was conservative with the possible joke keywords – if it was possible that someone intended it seriously, it’s in the survey.

How you can help

Go take the survey! You get five votes, so it’s OK to not be a bit uncertain.

If you hate them all, which one annoys you least?

Do you think we need a variation of the IL name familyorassembly?

Do you think we need to include the names internal and/or protected?

Will people confuse and English usage and bit operation?

Will people confuse whether the specified scope is the access or restriction (inclusion or exclusion)?

Should the tradition of all lower case in C# be broken?

Do we need a new keyword?

Is there value in paralleling VB?

Note: In the VB language design meeting on this topic (VM LDM 2014-03-12), we chose to add two new keywords "ProtectedAndFriend" and "ProtectedOrFriend", as exact transliterations of the CLI keywords. This is easier in VB than in C# because of VB’s tradition of compound keywords and capitalizations, e.g. MustInherit, MustOverride. Lucian Wischik [[ If C# parallels, obviously Friend -> internal ]]

I don’t think there’s a promise that the elected name will be the one chosen, but the top chosen names will weigh heavily in the decision.

Go vote, and along the way, some of the suggestions are likely to bring a smile to your face.

Should the feature even be included

There are two arguments against doing the feature. On this, I’ll give my opinion.

If you can’t name a thing, you don’t understand it. Understand it before including it.

This was a good argument the first time the feature came up. Maybe even the second or third or fourth or fifth. But it’s been nearly fifteen years. It’s a good feature and we aren’t going to find a name that no one hates. Just include it with whatever name.

Restricting the use of protected to the current assembly breaks basic OOP principles

OK, my first response is “huh?”

One of the core tenets of OOP is encapsulation. This generally means making a specific class a black box. There’s always been a balance between encapsulation and inheritance – inheritance breaks through the encapsulation on one boundary (API) while public use breaks through it on another.

Inheritance is a tool for reusing code. This requires refactoring code into different classes in the hierarchy and these classes must communicate internal details to each other. Within the assembly boundary, inheritance is a tool for reuse – to be altered whenever it’s convenient for the programmer.

The set of protected methods that are visible outside the assembly is a public API for the hierarchy. This exposed API cannot be changed.

The new scope – allowing something to be seen only by derived members within the same assembly – allows better use of this first style of sharing. To do this without the new scope requires making members internal; internal is more restrictive than protected. But marking members internal gives the false impression that it’s OK for other classes in the assembly to use them.

Far from breaking OOP, the new scope allows encapsulation of the public inheritance API away from the internal mechanics of code reuse convenience. It can be both clear to programmers and enforced that one set of members is present for programming convenience and another set for extension of class behavior.

The Sixth Level of Code Generation

I wrote here about the five levels I see in code generation/meta-programming (pick your favorite overarching word for this fantastically complex space).

I missed one level in my earlier post. There are actually (at least) six levels. I missed the sixth because I was thinking vertically about the application – about the process of getting from an idea about a program all the way to a running program. But as a result I missed a really important level, because it is orthogonal.

Side note: I find it fascinating how our language affects our cognition. I think the primary reason I missed this orthogonal set is my use of the word “level” which implied a breakdown in the single dimension of creating the application.

Not only can we generate our application, we can generate the orthogonal supporting tools. This includes design-time deployment (NuGet, etc), runtime deployment, editor support (IntelliSense, classification, coloration, refactorings, etc.), unit tests and even support for code generation itself – although the last might feel a tad too much like a Mobius strip.

Unit tests are perhaps the most interesting. Code coverage is a good indicator of what you are not testing, absolutely. But code coverage does not indicate what you are testing and it certainly does not indicate that you are testing well. KLOC (lines of code) ratios of test code to real code are another indicator, but still a pretty poor one and still fail to use basic to use basic boundary condition understanding we’ve had for what, 50 years? And none of that leverages the information contained in unit tests to write better library code.

Here’s a fully unit tested library method (100% coverage) where I use TDD (I prefer TDD for libraries, and chaos for spiky stuff which I later painfully clean up and unit test):

public static string SubstringAfter(this string input, string delimiter)
{
var pos = input.IndexOf(delimiter, StringComparison.Ordinal);
if (pos < 0) return "";
return input.Substring(pos + 1 );
}



.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

There are two bugs in this code.



Imagine for a minute that I had not used today’s TDD, but had instead interacted – with say… a dialog box (for simplicity). And for fun imagine it also allowed easy entry of XML comments; this is a library after all.



Now, imagine that the dialog asked about the parameters. Since they are strings – what happens if they are null or empty, is whitespace legal, is there an expected RegEx pattern, and are there any maximum lengths – a few quick checkboxes. The dialog would have then requested some sample input and output values. Maybe it would even give a reminder to consider failure cases (a delimiter that isn’t found in the sample). The dialog then evaluates your sample input and complains about all the boundary conditions you overlooked that weren’t already covered in your constraints. In the case above, that the delimiter is not limited to a length of one and I didn’t initially test that.



Once the dialog has gathered the data you’re willing to divulge, it looks for all the tests it thinks you should have, and generates them if they don’t exist. Yep, this means you need to be very precise in naming and structure, but you wanted to do that anyway, right?



Not only is this very feasible (I did a spike with my son and a couple conference talks about eight years ago), but there’s also very interesting extensions in creating random sample data – at the least to avoid unexpected exceptions in side cases. Yes, it’s similar to PEX, and blending the two ideas would be awesome, but the difference is you’re direct up-front guidance on expectations about input and output.



The code I initially wrote for that simple library function is bad. It’s bad code. Bad coder, no cookies.



The first issue is just a simple, stupid bug that the dialog could have told me about in evaluating missing input/output pairs. The code returns wrong the wrong answer if the length of the delimiter is greater than one and I’d never restricted the length to one. While my unit tests had full code coverage, I didn’t test a delimiter greater than one and thus had a bug.



The second issue is both common, insidious, and easily caught by generated unit tests. What happens if the input string or delimiter is null? Not only can this be caught by unit tests, but it’s a straightforward refactoring to insert the code you want into the actual library method – assertion, exception, or automatic return (I want null returned for null). And just in case you’re not convinced yet, there’s also a fantastic opportunity for documentation – all that stuff in our imagined dialog belongs in your documentation. Eventually I believe the line between your library code, unit tests and documentation should be blurry and dynamic – so don’t get too stuck on that dialog concept (I hate it).



To straighten one possible misconception in the vision I’m drawing for you, I am passionately opposed to telling programmers the order in which they should do their tasks. If this dialog is only available before you start writing your method – forget it. Whether you do TDD or spike the library method, whether you make the decisions (filling in the imagined dialog) up front or are retrofitting concepts to legacy code, the same process works.



And that’s where Roslyn comes in. As I said, we abandoned the research on this eight years ago as increasing the surface area of what it takes to write an app and requiring too much work in a specific order (and other reasons). Roslyn changes the story because we can understand the declaration, XML comments, the library method, the unit test name and attributes, and the actual code in the method and unit test without doing our own parsing. This allows the evaluation to be done at any time.



That’s just one of the reasons I’m excited about Roslyn. My brain is not big enough to imagine all the ways that we’re going to change the meaning of programming in the next five years. Roslyn is a small, and for what it’s worth highly imperfect, cog in that process. But it’s the cog we’ve been waiting for.

Explanation of Finding All Unique Combinations in a List

I showed an algorithm for finding all unique combinations of items in a list here.

I didn’t know whether the explanation would be interesting, so I simply offered to add it if someone wanted to see it. Someone did, so here goes.

Theory

Imagine first the problem on paper. Make a column for each item in the list – four is a good place to start and then you can generalize it upwards. The list will look something like this:

ComboAlgorithmExplanation1

I’ll update this grid to use the number 1 as an x, and add a column for no items, because the empty set is important for the problem I’m solving:

ComboAlgorithmExplanation2

And then assign each column to a bit. This makes each of the numbers from 0 to 15 a bit mask for the items to select to make that entry in the unique set.

ComboAlgorithmExplanation3

Code

The code creates a List of Lists because the problem I was solving was combinations of objects, not strings. I should have used IEnumerable here, as the returned list won’t be changed.

public static List<List<T>> GetAllCombos<T>(this List<T> initialList)
{
   var ret = new List<List<T>>();


I’m not sure about the mathematical proof for this, but the number of items is always 2^N (two to the power of N) or 2^N – 1 if you’re ignoring the empty set.

   // The final number of sets will be 2^N (or 2^N - 1 if skipping empty set)
   int setCount = Convert.ToInt32(Math.Pow(2, initialList.Count()));


When I started this post, I realized I’d left the Math.Pow function in this first call, so I’ll explain the difference. Math.Pow takes any value as a double data type and raises it to the power of another double. Doubles are floating points, making this a very powerful function. But in the special case of 2^N, there’s a much faster way to do this – perhaps two orders of magnitude faster. This doesn’t matter if you are only calling it once, but it is sloppy. Instead, I should have done a bit shift.

   var setCount = 1 << initialList.Count();


This bit shift operator takes the value of the left operand (one) and shifts it to the left. Thus if the initial count is zero, no shift is done, and there will be one resulting item, the empty list. If there are two items, the single bit that is set (initially 1) is shifted twice, and the result is 4:

ComboAlgorithmExplanation4

Since each number is an entry in my set of masks, I iterate over the list (setCount is 16 for four items):

   // Start at 1 if you do not want the empty set
   for (int mask = 0; mask < setCount; mask++)
   {


For my purposes, I’m creating a list – alternatively, you could build a string or create something custom from the objects in the list (a custom projection).

      var nestedList = new List<T>();


I then iterate over the count of initial items (4 for four items) – this corresponds to iterating over the columns of the grid:

      for (int j = 0; j < initialList.Count(); j++)
      {


For each column, I need the value of that bit position – the value above the letter in the grids above. I can calculate this using the bit shift operator. Since this operation will be performed many times, you definitely want to use bit shift instead of Math.Pow here:

          // Each position in the initial list maps to a bit here
          var pos = 1 << j;


I use a bitwise And operator to determine whether the bit is set, for each item in the list. If the mask has the bit in the position j set, then that entry in the initial list is added to the new nested list.

         if ((mask & pos) == pos) { nestedList.Add(initialList[j]); }
      }


Finally, I add the new nested list to the lists that will be returned, and finish out the loops.

      ret.Add(nestedList);
   }
   return ret;
}



Questions?

Another Look at Event Source Performance

EDIT: Added/changed things in the section near the end on Optimization #4 on 2013-11-06

In this post, Muhammad Shujaat Siddiqi talks about high volume ETW performance and the following three optimizations.

Optimization # 1: Use IsEnabled() before WriteEvent()

Optimization # 2: Convert static field fetch to local member fetch

Optimization #3: Minimize the number of EventSources in the application

Optimization # 4: Avoid fallback WriteEvent method with Object Array Parameter

He does a great job explaining each of them, but I disagree with the second optimization and wanted to add some comments on the others based on some rough metrics I ran to check performance.

To start with the summary, EventSource plays very nice with ETW and so maintains amazing, absolutely amazing, performance characteristics.

Optimization # 1: Use IsEnabled() before WriteEvent()

If all you’re doing is calling WriteEvent, there’s no preparatory work, then I haven’t seen a significant performance difference. Basically, the first thing WriteEvent does is check whether it’s enabled, and the one extra method call is awfully hard to measure.

However, if you’re doing preparatory work that has any expense, you should absolutely use this. Preparatory work most often happens in complex calls, where a public method marked with the NonEvent attribute calls a private method that is the actual Event definition. In this case, you need to check whether the class is enabled in that public method, before the preparatory work (I realize this might be obvious, but it’s important).

As Muhammad pointed out, if you have keywords or levels associated with the event, you can check them. But the overarching principle is simplicity, so I think only applies in the far more rare verbose/keyword scenario than in the more common scenario where you use the defaults except for provider name and event id.

That said, if you always use IsEnabled() as soon as you arrive in your EventSource class, you will remember to do it on the rare occasion it matters.

Optimization # 2: Convert static field fetch to local member fetch

Muhammad recommends instantiating an instance of the logger in each class that uses it, instead of accessing a static field in the logger class. I tested instantiating the instance vs. a call to the static method for the reverse reason – because some documentation suggested it was slower to instantiate rather than reuse. I could not measure a difference, so arguably this is a matter of opinion. I stated that disagree with Muhammad, not that he is wrong.

I believe there are two principles that override performance, since the perf difference is somewhere between small and non-existent.

The first is that I want to minimize the greatest possible extent code (textual) dependency between my application classes and my logger. Unfortunately, for some technical reasons EventSource does not easily support interfaces, and therefore does not easily support dependency injection (look for an upcoming blog post on that). So, you’re stuck with the dependency of a direct call, or the actual class name. Since the direct call is easier to fix later if you choose to introduce the complexity of DI with EventSource, I believe it is preferable.

I also do not want multiple copies of the logger in case a future way I am logging requires shared state. So, my preferred way to instantiate blocks creation of multiple instances:

[EventSource(Name = "KadGen-EtwSamples-NormalEventSource")]


public class NormalEventSource : EventSource


{


    private static readonly Lazy<NormalEventSource> Instance = new Lazy<NormalEventSource>(() => new NormalEventSource());


    private NormalEventSource()


    {


    }


    public static NormalEventSource Log


    {


        get { return Instance.Value; }


    }    static public NormalEventSource Log = new NormalEventSource();


 


    [Event(1)]


    public void AccessByPrimaryKey(int PrimaryKey, string TableName)


    {


        if (IsEnabled()) WriteEvent(1, PrimaryKey, TableName);


    }   


 


}


 


 

.csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, “Courier New”, courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, “Courier New”, courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

Please determine the best approach for your team, and be consistent.

Optimization #3: Minimize the number of EventSources in the application

I didn’t actually measure this, because I think performance here is irrelevant.

Each EventSource class maps to a single ETW provider with its name, id and manifest.

Designing your providers is one of the most critical aspects of creating your ETW system. I believe one per component and some sharing (with associated hazards), and Muhammad suggests one per type, which is far more than I would use. Regardless, there is no other aspect – not criticality level, not channel, not opcodes, not keywords – that is in the ballpark of provider names (ids) when it comes to your design.

Later in a consumer, you will almost certainly sort and filter on the provider names. You will collect data based on provider names, and the more you have the more chance that you’ll miss key events.

Create exactly the number of EventSources you need and no more. When in doubt, take the path that results in a simpler strategy.

Optimization # 4: Avoid fallback WriteEvent method with Object Array Parameter

EDIT: Added/changed things in this section 2013-11-6

This is the place you can change the performance characteristic of your ETW tracing – for the good or for the bad.

As Muhammad explains well, the WriteEvent method has a finite number of overloads. If you pass parameters that do not match one of these overloads, a paramarray of objects is used. Now, I didn’t anything close to the 10-20 times proposed in the spec, but it double the time, which is enough to get my attention.

There are two scenarios where you’ll wind up with the object overload. The simplest to fix is passing enums. Accept the enum into your EventSource method as the corresponding enum so that the manifest correctly manages the enum. And then cast the enum to an int in the WriteEvent call like this:

[Event(1)]


public void AccessByPrimaryKey(int PrimaryKey, string TableName, ConnectionState ConnectionState)


{


    if (IsEnabled()) WriteEvent(1, PrimaryKey, TableName, (int)ConnectionState);


}


 



If I check the overloads for WriteEvent, there is an (int, string, int) overload so this call will not hit the object overload. But, it’s quite easy to create a collection of parameters that don’t match one of the overloads. I do not believe that it is appropriate to rearrange parameters to your public method just to match an overload – and the parameters to the WriteEvent call must match the order of parameters passed to your public method.



The fix for this isn’t quite as trivial – it’s easy, but ugly. This is explained in the EventSource specification, and section 5.6.2.2 Creating fast WriteEvent overloads explains how to create the overloads. I always track down the spec as the attachment to this post on Vance Morrison’s blog.



Summary



Muhammad provided a very good, extremely readable, accessible discussion of potential EventSource/ETW performance issues.



· None of these performance tips will be observable in your application if you use a listener other than the blazingly fast ETW.



· Based on the metrics I ran a while back, I believe the only one of these performance tips worth worrying about is avoiding the object overload, and it isn’t as bad as anticipated in the spec.



· There are other considerations, at least consistency, that are important for the other three items.



As you dig deeper into ETW, side issues like this come up. The isolation of all EventSource aspects into the single class – inherent in its design – means these complexities always affect only a single location in your application if you encounter them.

Plain Old Objects and MEF

After my MEF presentation at the Hampton Roads .NET User Group someone asked me about creating objects like customers and invoices via MEF. I gave an overly quick answer to a really good question.


A lot of the IoC history involves using dependency injection for services. This is great partly because it’s a framework to isolate plain old objects from services, and services from each other. Like many of the techniques we’ve adopted in the agile timeframe, it’s not just what the technique does for us, but what the technique does to us. That’s the quick answer I gave.


But, we can go further to fully composed systems. Fully composed systems have offered mind boggling benefits in some places they’ve been tried, and they haven’t been tried in very many places yet. This is why I have such a high fascination with NetKernel and the work people like Randy Kahle (@RandyKahle) and Brian Sletten (@bsletten) are doing. And that work is similar to work Juval Lowy and I have talked about for a number of years.


However, fully composed systems with MEF, and I assume other DI tools (although I’ll be happy to be proven wrong) are hard. Without the infrastructure of something like NetKernel there’s a fair amount of work to do, and without the caching benefits of NetKernel it’s going to be tough to justify. It’s hard because everything needs an interface. Everything. And even if you generate the interfaces and plain objects, the level of infrastructure ceremony gets very unwieldy.. At least that’s my experience from using MEF to wrap everything (yes, everything) in interfaces, in order to create a fully composed MEF 1.0 system.


We could go a slightly different direction. Put everything into the container, but place plain old objects in the container as their own type, rather than via an interface. Plain old objects in this sense are objects that we can’t imagine a scenario where they’ll be reused and they have a unique, and generally multi-valued interface. A customer or invoice POCO would be examples.


Placing these objects into the container offers the immediate benefit of guaranteeing their usage is isolated We take advantage of what DI does to us, not just for us..


And if we use inference in MEF 2.0 (.NET 4.5), and probably configuration techniques with other IoC containers, we can stick another object in if we have a later reason to do it.


But here’s where strong typing bites us. Any new class that ever replaces that plain old object (the customer or invoice) has to be assignable to that class. That means it has to be that class or a class that derives from it. I’m still hanging on to the strong typing life boat because I still feel that without it I’m in a North Atlantic storm. For big systems, I think that’s still true, and while I put a lot of thought into making big systems into smaller systems, non-typed based DI is still a jump into icy water for me.


With the plain object in the container under its own type, if I get blindsided with a requirement that just doesn’t fit I can manage, I just have to write a wrapper for the non-derived object and the wrapper has to derive from the expected type. Ugly, but workable.


What I want to do is experiment with strongly typed systems with generated interfaces. I’ve already done this with traditional generation, and I want a solution that is cleaner than that. I don’t have the syntax worked out, but imagine that we never create the interface for our plain old object, we just tell the system to do it. The container uses the interface, all using objects request the object by its interface, and we humans can ignore it.


Until the day the plain old object needs attention. On that day, we make the interface explicit and do whatever we need to do.


But with the versions of .NET we have today, we can’t build this.

To “as” or not to “as”

Iris Classon is asking and finding answers for “stupid” questions on her blog. First of all, they are definitely not stupid questions. There are a lot of nuances of .NET that slip out of our daily thinking. When I first started my Getting Geeky with the .NET Framework series I’d ask really hot coders basic questions – things that affect how code runs – and the results were not stellar. So, first, three cheers for Iris’s project.


Today’s question was “How should I do casting in C#? Should I use the prefix cast or the as-cast?” She’s got some good input that I suggest you read first. I also felt that the discussion left out a significant factor – debugging.


If the value resulting from the cast really should not be null, the code becomes unstable or incorrect at the point of the cast. Period. With the prefix cast, application state clearly becomes incorrect at that point because an exception occurs. With the “as” cast, the application state still becomes incorrect at that point, you just don’t know it. If the value should not be null, you’ll get a null exception, or possibly failure of a Requires clause if you’re using code contracts somewhere, sometime later in application execution.


One of the things that makes hard bugs hard is when there is a disconnect in time or space between the cause and the symptom. Time is time, space is lines of code, assembly placement, etc. Code can be written to minimize these disconnects. One of the ways to do that is to fail quickly. When application state becomes incorrect, yell about it immediately and rarely continue with the application in an invalid state. A null-reference exception at an unexpected time just makes code more difficult to debug.


Eric Lippert’s quote short-cutted the idea of what happens next. He’s quoted as saying that a prefixing cast failure requests, “then please crash the program.” That would be much better put as “then please do not continue the current operation.” Our programs need never crash in the sense of blue screen ugly death. Exception handling is a mechanism for recovering to a valid state, with the capacity to communicate failure to interested parties like the user and system logs.


So, use the “as” cast only when the resulting null does not make the application state incorrect, or you have an immediate and adjacent test that you prefer to make the application state correct or perform different exception management. For example, you might want to add a specific error message if the invalid cast indicates a particular type of problem.


I think the prefixing cast should be the “go to” cast for .NET – the one you use when you haven’t thought through the implications of the null value to the remainder of your program.

Open Source

It’s occurred to me that if you are following this and my DNR TV show a logical reaction would be “OK, so that’s a lot of hot air, where do I get it?” I intend for all of this to be released Open Source, on whichever site s hot when I release it. I hope I’ll start releasing pieces in just a matter of weeks. It will help a lot if it becomes “we” instead of “me”. So, if you’re interested n this stuff and you want to help, let me know and you can get this stuff directly as its rapidly evolving too much to publically post right now.

My current expectation of the order of release:

  • Metadata interfaces
  • GenDotNet metadata load
  • EDMX metadata load
  • Template infrastructure – what’s needed for the language neutral templates
  • Simple harness
  • Activity based harness

If you’re interested, you can contact me directly at Kathleen@mvps.org

Isolating Metadata

In code generation, metadata is the information about your application, generally about your database and definitions to express your data as business objects. If you use Entity Framework, your metadata is the edmx file which is displayed via the designers. If you’re using CodeSmith, the metadata is more subtle. Metadata can also be about the process itself. CodeBreeze in particular has a very rich set and extensible set of information about your application.

Since metadata itself is data – information – we can store it many ways. I’ve used XML for years. CodeSmith has used a couple of mechanisms including XML. Entity Framework uses XML. Metadata can also come directly from a database, although I think this is a remarkably bad idea and one of my code generation principles is not to do that – you need a level of indirection and isolation surrounding your database.

What I haven’t talked about before how valuable it is to have another layer of indirection between your metadata storage structure – your XML schema – and your templates. In my XSLT templates I could provide this only through a common schema – you can morph your XML into my schema so that’s indirection – right?

No, that’s not really indirection. It’s great to be back in .NET classes with real tools for isolation and abstraction. Now I use a set of interfaces for common metadata constructs such as objects, properties and criteria. I can then offer any number of sets of metadata wrappers that implement these interfaces via a factory.

 

MetadataIsolation

 

The template programs only against the interfaces. The template could care less whether I am using entity framework, my own metadata tools, or something entirely different. I can write the same template and use it against Entity Framework’s edmx file or any other metadata format. That’s powerful stuff. Especially since you already heard that the template will run against C# or VB. That means in my world the only reason to have more than one set of templates against an architecture like CSLA is that they are pushing the boundaries and actually doing different things.

But if you don’t like this new templating style, you can use classes based on exactly the same interfaces in CodeSmtih (at least) and again free your framework and metadata extraction. You’ll still need VB/C# versions there, but you’re metadata input can use the same interfaces.

The interfaces is expressed in sets of classes that know how to load themselves from a data source. Each set uses a different metadata source – different XML structures or other format.

Isolated metadata removes your templates from caring what the metadata source is – beyond being something that could successfully fill a set of classes that implement the data interfaces. This is a very important step and one we need to work together to get right. What do you think I’ve left out of the current design?

CurrentTypeInfo and the Context Stack

Creating templates requires a lot of access to the thing you’re currently creating. That’s the current output type, which as I discussed in yesterday I suffix with “Info.” The CurrentTypeInfo is thus what you’re currently outputting.

I neglected to clarify in that post that the Data and the Info classes are in entirely different assemblies. The Data classes could be used with any .NET generation mechanism, including (at least) XML Literal code generation and CodeSmith. The Info objects are relatively specific to my style of code generation.

The CurrentTypeInfo may not be the same throughout the file.

There are a few reasons to combine multiple classes or enums in a file. In some cases, that’s to nest them, and in some cases it’s just to logically combine them in a file. While FxCop has declared them unfashionable, I find nested classes tremendously valuable, especially for organizing Intellisense and keeping well structured classes and naming. If you’re working with nested classes, there is a good chance you’ll need to access not only the current type, but also the encapsulating type. I use a stack for this, and give control of pushing and popping TypeInfo objects from the stack to the templates themselves.

TemplateInheritanceDotNetBase3

The base item on the stack is the current outer most class in the current file. Once you’ve pulled a class off the stack, such as by popping the base and adding a new base TypeInfo, you can’t access the previous version unless you’ve saved it.

Here’s where you see the flaw I mentioned yesterday – these classes could be in separate namespaces, and I don’t allow for that – yet. I’ll fix it later.

Remember the TypeInfo is the thing you’re outputting. The entity definition it’s built from is ObjectData in my semantics.

The stack is an extremely useful construct for this scenario. You have quick access to information about the class you’re currently outputting. You’ll frequently need this for type information and perhaps calling shared/static methods. You don’t want to recreate its name every time you use it because that would be redundant and hard to maintain. You can also access any of the containing classes, which again is useful in defining types and calling shared/static methods.

While I haven’t done this in CodeSmith, I expect this technique to be viable there. I’m not sure on other platforms, but it’s not specific to XML literal code generation.

Two Parallel Entities – Metadata and Info

One of the confusing things about templating is that you are writing two programs simultaneously and there is no way around it. My brilliant son may write a templating language for a class project, and this is exactly what he wants to address – that and the issue of escaping. You can’t avoid it, he just thinks they should look different – I’m hoping he doesn’t write one in Magyar.

One of the programs you’re writing (or maintaining) is the template program. There is logic in any non-trivial template, regardless of the template style. The other program is the one you’re outputting that will eventually run on your clients computer. In ASP.NET style templates, the output program is the overall template, and the logic is inserted. In XSLT and XML literal generation, the logic of the template program is the overall code and the output template is nested inside.

You are also working with two sets of data – the metadata you’re using as source and what you’re currently outputting. I call the elements in the XML I use as source metadata for .NET templates Object and Property for clarity in the XML. If I called them that in the templates, I’d encounter great confusion between the object definition in metadata and the class I am outputting. That’s particularly painful when it comes to properties.

I solve this by calling suffixing all metadata with the word “Data”. Thus I have ObjectData, PropertyData, CriteriaData, etc. Each of these classes contains the metadata on the corresponding item. I might have an ObjectData for a Customer that had a PropertyData for FirstName. The PropertyData would include things like the maximum length of the field and its caption.

I also need to describe the output – at least the way I’m building templates. I need information about the file I’m creating, the type I’m creating, and in certain locations in the template, the property, function, constructor, etc I’m creating. I identify these by suffixing them with the word “Info”. Thus I have a ClassInfo, PropertyInfo, FunctionInfo, etc. I do not have a CriteriaInfo because I am never outputting a .NET thing called a Criteria. It’s strictly metadata for .NET features.

In the template design I’m describing in this series, the DotNetBase class contains information on the file being output. To be picky, the namespace can actually be associated with part of a file, and I may add this flexibility, but I don’t do that very often in business code, so I have included namespaces at the file level.


 TemplateInheritanceDotNetBase2


Regardless of the template mechanism you use, you need to maintain a clear separation between the template logic and the output logic, and between the input metadata and state about what you’re creating.

Next post: CurrentTypeInfo and the Context Stack