Category Archives: 16972

Public Annotations

Code is data that communicates your intent. If you have no special relationship with your compiler, you don’t need any special data to communicate additional intent.


Once you’re in an open compiler world, you may need to communicate with your compiler. This features has been called “design time attributes” and “annotations.” I’m adding this feature to RoslynDom and calling it “PublicAnnotations.”


I’ll try to always remember to say “public annotations” to differentiate it from Roslyn private annotations.


 


Why not just use attributes?


Attributes do not work very well to communicate with the compiler for at least these reasons:


  • Can’t tell what’s available at runtime
  • If design attributes are visible at runtime, they become a contract
  • Can result in build dependency
    o If one player removes attributes it’s done with to avoid runtime contracts
  • Must follow attribute syntax
  • Only constants allowed
    o No lambda expressions
    o No generic types
    o No expressions
  • Can’t be placed in all desired locations
    o Not on namespaces, files, or in random locations inside or outside methods

I think the first is actually the biggest issue. I think it’s important to differentiate communications with the compiler pipeline, including design time with the Visual Studio/Roslyn linkage, and runtime attributes. But even if you disagree with that, attributes simply don’t work because of limitations in attribute content and attribute location.


 


The Syntax


Eventually there will be enough examples of public annotations that an obvious syntax can be included in the languages. I’m not willing to wait as I need public annotations right now, like today.


The current syntax has to reside inside a comment. That’s the only way to solve the content and location limitations of attributes without changing the compiler.


The syntax should be clearly differentiated from all other lines of code to allow easy recognition by human, parser/loader and later IDE colorization. The syntax should also be easily found via RegEx to allow updates if we get language support.


RoslynDom now supports the following, with any desired whitespace within the line.


//[[ NameOfAnnotation(stuff) ]]


This currently requires the annotation appear on a single line and I’m not currently supporting end of line comments.


Because it’s familiar to you, the annotation looks like an attribute, except for that funny double square bracket. It does not need an attribute class to exist anywhere, and one generally will not exist.


Just like attributes, the following variations are all supported, along with the logical combinations:


//[[ NameOfAnnotation() ]]

//[[ NameOfAnnotation(stuff) ]]

//[[ NameOfAnnotation(name:stuff, name2 : stuff) ]]

//[[ NameOfAnnotation(name=stuff, name2 = stuff) ]]


The common way to add annotations will be to include them in your source code. You can also add annotations explicitly with the AddPublicAnnotationValue(string name, object value) and the AddPublicAnnotationValue(string name, string key, object value) methods.


Public annotations with no parameters are just accessed to see if the public annotation exists via its name.


A single positional value is supported and is accessed via the public annotation name.


Named values are accessed by the public annotation name and the value name as a key.


 


Legal locations for public annotations


Annotations are currently legal on using statements, namespaces, types and members. They are also legal at the file or root level.


var csharpCode = @"
//[[ file: kad_Test4(val1 = "
"George"", val2 = 43) ]]
//[[ kad_Test1(val1 : "
"Fred"", val2 : 40) ]]
using Foo;

//[[ kad_Test2("
"Bill"", val2 : 41) ]]
//[[ kad_Test3(val1 ="
"Percy"", val2 : 42) ]]
public class MyClass
{ }
"
;

This illustrates a challenge. A likely location for public annotations is the file level. But I then need to distinguish between the file or root level public annotation and annotations on the first item in the file. I decided to do this by prefixing the file public annotation. I am currently supporting both file and root. 


Accessing your values


The following methods are available for accessing public annotations


bool HasPublicAnnotation(string name);

void AddPublicAnnotationValue(string name, string key, object value);
void AddPublicAnnotationValue(string name, object value);

object GetPublicAnnotationValue(string name, string key);
object GetPublicAnnotationValue(string name);
T GetPublicAnnotationValue<T>(string name);
T GetPublicAnnotationValue<T>(string name, string key);

These methods are available on all items via the IDom interface.


 


Full example


Here’s the full example from the Scenario_PatternMatchingSelection class of the RoslynDomExampleTests project.


[TestMethod]
public void Can_get_and_retrieve_public_annotations()
{
var csharpCode = @"
//[[ file: kad_Test4(val1 = "
"George"", val2 = 43) ]]
//[[ kad_Test1(val1 : "
"Fred"", val2 : 40) ]]
using Foo;

//[[ kad_Test2("
"Bill"", val2 : 41) ]]
//[[ kad_Test3(val1 ="
"Percy"", val2 : 42) ]]
public class MyClass
{ }
"
;
var root = RDomFactory.GetRootFromString(csharpCode);

var using1 = root.Usings.First();
Assert.AreEqual("Fred",using1.GetPublicAnnotationValue <string>("kad_Test1","val1"));
Assert.AreEqual("Fred",using1.GetPublicAnnotationValue("kad_Test1","val1"));
Assert.AreEqual(40, using1.GetPublicAnnotationValue <int>("kad_Test1","val2"));
Assert.AreEqual(40, using1.GetPublicAnnotationValue("kad_Test1","val2"));

var class1 = root.RootClasses.First();
Assert.AreEqual("Bill", class1.GetPublicAnnotationValue( "kad_Test2"));
Assert.AreEqual(41, class1.GetPublicAnnotationValue("kad_Test2", "val2"));
Assert.AreEqual("Percy", class1.GetPublicAnnotationValue("kad_Test3", "val1"));
Assert.AreEqual(42, class1.GetPublicAnnotationValue("kad_Test3", "val2"));

Assert.AreEqual("George", root.GetPublicAnnotationValue("kad_Test4", "val1"));
Assert.AreEqual(43, root.GetPublicAnnotationValue("kad_Test4", "val2"));

}

RoslynDom and Friends – Just the Facts

See this post for the Roadmap of these projects

RoslynDom

A wrapper for the .NET Compiler Platform – the roadmap has further plans

Project on GitHub

See the RoslynDomExampleTests project in the solution for the 20 things you’re most likely to do

Download via Visual Studio NuGet Package Manager if you want to play with that

RoslynDom-Provider

By Jim Christopher

A PowerShell provider for Roslyn Dom

Project on GitHub

CodeFirstMetadata

Strong-typed metadata from code-first (general sense, not Entity Framework sense)

Project on GitHub

See the ConsoleRunT4Example project in the solution along with strong-typed files and T4 usage

Roadmap for RoslynDom, CodeFirst Strong-typed Metadata and ExpansionFirst Templates

I’ve been working on three interleaved projects RoslynDom, CodeFirst Strong-typed Metadata and ExpansionFirst Templates. Also, Jim Christopher (aka beefarino) built a PowerShell provider. This post is an overview of these projects and a roadmap of how they relate to each other.


You can find the short version here.


clip_image002[7]


In the roadmap, blue indicates full (almost) test coverage and that the library has had more than one user, orange indicates preliminary released code, and grey indicates code that it’s really not ready to go and not yet available.


I’m working left to right, waiting to complete some features of the RoslynDom library until I have the full set of projects available in preliminary form.


RoslynDom Library


.NET Compiler Services, or Roslyn, does exactly what it was intended to do, which is exactly what we want it to do. It’s a very good compiler, now released as open source, and exposing all of its internals. It’s great that we get access to the internal trees, but it’s not happy code for you and I to use – it’s compiler internals.


At the same time, these trees hold a wealth of information we want – it’s more complete information than reflection, holds design information like comments and XML documentation, and it’s available even when the source code doesn’t compile.


When you and I ask questions about our code, we ask simple things – what are the classes in this file? We don’t care about whitespace, or precisely how we defined namespaces. In fact, most of the time, we don’t even care about namespaces at all. And we certainly don’t care whether a piece of information is available in the syntactic or semantic tree or whether attributes were defined with this style or that style.


RoslynDom wraps the Roslyn compiler trees and exposes the information in a programmer friendly way. Goals include


  • Easy access to the tree in the way(s) programmers think about code as a hierarchy
  • Easy access to common information about the code as parameters
  • Access to the applicable SyntaxNode when you need it
  • Access to the applicable Symbol when you need it
  • Planned: Access to the full logical model – solution to smallest code detail
    (Currently, file down to member)
  • Planned: A kludged public annotation/design time attribute system until we get a real one
    (Currently, attribute support only)
  • Planned: Ability to morph and output changes
    (Currently, readonly)

Getting RoslynDom


You can get the source code on GitHub, and there’s a RoslynDomExampleTests project which shows how to do about 20 common things.


The project is also available via NuGet. It’s preliminary, use cautiously. Download with the Visual Studio NuGet package manager.


RoslynDom-Provider


Jim Christopher created a PowerShell provider for RoslynDom. PowerShell providers allow you to access the underlying tree of information in the same way you access the file system. IOW, you can mount your source code as though it was a drive.


I’m really happy about the RoslynDom-Provider. It shows one way to use a .NET Compiler Platform/library to access the information that’s otherwise locked into the compiler trees. It’s also another way for you to find out about the amazing power of PowerShell providers. If you’re new to PowerShell, and you’re a Pluralsight subscriber, check out “Discovering PowerShell with Mark Minasi”. It uses Active Directory as the underlying problem and a few parts may be slow for a developer, but it will give you the gist of it. Follow up with Jim Christopher’s “Everyday PowerShell for Developers” and “PowerShell Gotchas.” If you’d rather read, there are a boatload of awesome books including PowerShell Deep Dives and Windows PowerShell for Developers, and too many Internet sites for me to keep straight.


Getting RoslynDomProvider


This project is available on GitHub.


Code-first Strong-typed Metadata


You can find out more about strong-typed metadata here and code-first strong-typed metadata here.


As a first step, I have samples in runtime T4. These run from the command line at present. These templates inherit from a generic base class that has a property named Meta. This property is typed to the underlying strong-typed metadata item – in the samples either CodeFirstSemanticLog or CodeFirstClass. The EventSource template and problem is significantly more complex, but avoids some extra mind twisting with a strong-typed metadata class around a class. These templates are preliminary and do not handle all scenarios.



Metaprogramming


While there are a couple of ways to solve a metaprogramming expansion or code first problem, I’ve settled on an alternate file extension. The code-first minimal description is in a file with a .cfcs extension. Because I lie to Visual Studio and tell it that this is a C# file (Tools/Options/Text Editor/File Extensions) I get nice IntelliSense for most features (more work to be done later). But because MSBuild doesn’t see it as a C# file, the .cfcs file is ignored as a source file in compilation.


Generation produces an actual source code file in a file with a .g.cs extension. This file becomes part of your project. This is the “real” code and you debug in this “real” code because it’s all the compiler and debugger know about. As a result


- You write is the minimal code that only you can write


- You understand your application through either the minimal or expanded code


- You easily recognize expanded code via a .g.cs extension


- You can place the minimal and expanded code side by side to understand the expansion


- You debug in real code


- You protect the generated code by allowing only the build server to check in these files


Again this happens because there are two clearly differentiated files in your project – the .cfcs file and the .g.cs file.


The intent is to have this automated as part of your normal development pipeline, through one or more mechanism – build, custom tools, VS extension/PowerShell. The pipeline part is not done yet, but you can grab the necessary pieces from the console application in the example.


You can also find more here.



Getting CodeFirstMetadata


You can get this project on GitHub.


I’ll add this to NuGet when the samples are in a more accessible from your Visual Studio project.


ExpansionFirst Templates


T4 has brought us a very long way. It, and CodeSmith have had the lion’s share of code generation templating in the .NET world for about a decade. I have enormous respect for people like Gareth Jones who wrote it and kept it alive and Oleg Sych who taught so many people to use it. But i think it’s time to move on. Look for more upcoming on this – my current bits are so preliminary that I’ll wait to post.


Summary


I look forward to sharing the unfinished pieces of this roadmap in the coming weeks and months.


I’d like to offer a special thanks to the folks in my April DevIntersection workshop. The challenges of explaining the .NET Compiler Platform/Roslyn pieces to you let me to take a step back and isolate those pieces from the rest of the work. While this put me way behind schedule, in the end I think it’s valuable both in simplifying the metaprogramming steps and in offering a wrapper for the .NET Compiler Platform/Roslyn.

Code-first Metadata

This is “code first” in the general sense, not the specific sense of Entity Framework. This has nothing to do with Entity Framework at all, except that team showed us how valuable simple access to code-like metadata is.


Code first is a powerful mechanism for expressing your metadata because code is the most concise way to express many things. There’s 60 years of evolution to todays’ computer languages being efficient in expressing explicit concepts based on a natural contextualization. You can’t get this in JSON, XML or other richer and less-opinionated formats.


Code first is just one approach to getting strong-typed metadata. The keys to the kingdom, the keys to your code, lie in expressing the underlying problems of your code in a strong-typed manner, which you can read about here.


The problem is that the description of the problem is wrapped up with an enormous amount of ceremony about how to do what we’re trying to do. Let’s look at this in relation to metaprogramming where the goal is generally to reduce ceremony and


Only write the code that only you can write



In other words, don’t write any code that isn’t part of the minimum definition of the problem, divorced of all technology artifacts.


For example, you can create a SemanticLog definition that you can later output as an EventSource class, or any other kind of log output – even in a different language or on a different platform.


To do this, describe the SemanticLog in the simplest way possible, devoid of technology artifacts.


namespace ConsoleRunT4Example
{
[SemanticLog()]
public class Normal
{
public void Message(string Message) { }
[Event(2)]
public void AccessByPrimaryKey(int PrimaryKey) { }
}

}

Instead of the EventSource version:


using System;
using System.Diagnostics.Tracing;

namespace ConsoleRunT4Example
{
[EventSource(Name = "ConsoleRunT4Example-Normal")]
public sealed partial class Normal : EventSource
{
#region Standard class stuff
// Private constructor blocks direct instantiation of class
private Normal() { }

// Readonly access to cached, lazily created singleton instance
private static readonly Lazy<Normal> _lazyLog =
new Lazy<Normal>(() => new Normal());
public static Normal Log
{
get { return _lazyLog.Value; }
}
// Readonly access to private cached, lazily created singleton inner class instance
private static readonly Lazy<Normal> _lazyInnerlog =
new Lazy<Normal>(() => new Normal());
private static Normal innerLog
{
get { return _lazyInnerlog.Value; }
}
#endregion


#region Your trace event methods

[Event(1)]
public void Message(System.String Message)
{
if (IsEnabled()) WriteEvent(1, Message);
}
[Event(2)]
public void AccessByPrimaryKey(System.Int32 PrimaryKey)
{
if (IsEnabled()) WriteEvent(2, PrimaryKey);
}
#endregion
}
}

Writing less code (10 lines instead of 47) because we are lazy is a noble goal. But the broader benefit here is that the first requires very little effort to understand and very little trust about whether the pattern is followed. The second requires much more effort to read the code and ensure that everything in the class is doing what’s expected. The meaning of the code requires that you know what an EventSource is.


Code-first allows you to just write the code that only you can write, and leave it to the system to create the rest of the code based on your minimal definition.

Strong-typed Metadata

Your code is code and your code is data.

Metaprogramming opens up worlds where you care very much that your code is data. Editor enhancements open up worlds where you care very much that your code is data. Visualizations open up worlds where you care very much that your code is data. And I think that’s only the beginning.

There’s nothing really new about thinking of code as data. Your compiler does it, metaprogramming techniques do it, and delegates and functional programming do it.

So, let’s make your code data. Living breathing strongly-typed data. Strong typing means describing the code in terms of the underlying problem and providing this view as a first class citizen rather than a passing convenience.

Describing the Underlying Problem

I’ll use logging as an example, because the simpler problem of PropertyChanged just happens to have an underlying problem of classes and properties, making it nearly impossible to think about with appropriate abstractions. Class/property/method is only interesting if the underlying problem is about classes, properties and methods.

The logging problem is not class/method – it’s log/log event. When you strongly type the metadata to classes that describe the problem being solved you can reason about code in a much more effective manner. Alternate examples would be classes that express a service, a UI, a stream or an input device like a machine.

I use EventSource for logging, but my metadata describes the problem in a more generalized way – it describes it as a SemanticLog. A SemanticLog looks like a class, and once you create metadata from it, you can create any logging system you want.

Your application has a handful of conceptual groups like this. Each conceptual group has a finite appropriate types of customization. Your application problem also has a small number of truly unique classes.

Treating Metadata as a First Class Citizen

In the past, metadata has been a messy affair. The actual metadata description of the underlying patterns of your application have been sufficiently difficult to extract that you’ve had no reason to care,. Thus, tools like the compiler that treated your code as data simply created the data view it needed and tossed in out as rubbish when it was done.

The .NET Compiler Platform, Roslyn, stops throwing away its data view. It exposes it for us to play with.

Usage Examples

I’m interested in strongly typed metadata to write templates for metaprogramming. I want these template to be independent of how you are running them – whether they are part of code generation, metaprogramming, a code refactoring or whatever. I also want these templates to be independent of how the metadata is loaded.

Strongly typed metadata works today in T4 templates. My CodeFirstMetadata project has examples.

I’m starting work on expansion first templates and there are many other ways to use strong-typed metadata – both for other metaprogramming techniques and completely different uses. One of the reasons I’m so excited about this project is to see what interesting things people do, once their code is in a strong-typed form. At the very least, I think it will be an approach to visualizations and ensuring your code follows expected patterns. It will be better at ensuring large scale patterns than code analysis rules. Whew! So much fun work to do!!!

Strong-typed Metadata in a T4 Template

Here’s a sample of strong typing in a T4 template

 
image
 



There’s some gunk at the top to add some assemblies and some using statements for the template itself. The important piece at the top is that the class created by this template is a generic type with a type argument – CodeFirstSemanticLog – that is a strong-typed metadata class. Thus the Meta property of the CodeFirstT4CSharpBase class is a SemanticLog class and understands concepts specific to the SemanticLog, like IncludesInterface. I’ve removed a few variable declarations that are specific to the included T4 files.

The Case of the Terrible, Awful Keyword

In the next version C# there will be a feature with a name/keyword you will probably hate.

The thread on CodePlex is even named “private protected is an abomination.”

This is the story of that name and what you can do to help get the best possible name.

The feature and why we don’t already have it

C# has a feature called protected internal. Protected internal means that the member is available to any code in the same assembly (internal) and is also available to code in any derived class (protected). In the MSIL (Intermediate Language), this is displayed as famorassem (family or assembly).

MSIL also supports famandassem (family and assembly) which allows access only from code that is in a derived class that is also in the current assembly.

Previously, every time the team has considered adding this feature in C#, they’ve decided against it because no one could think of a good name.

For the next version of C#, the team decided to implement this feature, regardless of whether they could come up with a suitable name. The initial proposal by the team was “private protected.” Everyone seems to hate that name.

The process

One of the great things about this point in language design is that the process is open. It continues to be open to insider’s like MVPs a bit earlier – which reduces chaos in the public – but the conversation is public while there’s still room for change..

In this case, the team decided on a name (private protected) and the outcry caused the issue to be reopened. That was great, because it allowed a lot of discussion. It seems clear that there is no obvious choice.

So the team took all the suggestions and made a survey. Lucian was conservative with the possible joke keywords – if it was possible that someone intended it seriously, it’s in the survey.

How you can help

Go take the survey! You get five votes, so it’s OK to not be a bit uncertain.

If you hate them all, which one annoys you least?

Do you think we need a variation of the IL name familyorassembly?

Do you think we need to include the names internal and/or protected?

Will people confuse and English usage and bit operation?

Will people confuse whether the specified scope is the access or restriction (inclusion or exclusion)?

Should the tradition of all lower case in C# be broken?

Do we need a new keyword?

Is there value in paralleling VB?

Note: In the VB language design meeting on this topic (VM LDM 2014-03-12), we chose to add two new keywords "ProtectedAndFriend" and "ProtectedOrFriend", as exact transliterations of the CLI keywords. This is easier in VB than in C# because of VB’s tradition of compound keywords and capitalizations, e.g. MustInherit, MustOverride. Lucian Wischik [[ If C# parallels, obviously Friend -> internal ]]

I don’t think there’s a promise that the elected name will be the one chosen, but the top chosen names will weigh heavily in the decision.

Go vote, and along the way, some of the suggestions are likely to bring a smile to your face.

Should the feature even be included

There are two arguments against doing the feature. On this, I’ll give my opinion.

If you can’t name a thing, you don’t understand it. Understand it before including it.

This was a good argument the first time the feature came up. Maybe even the second or third or fourth or fifth. But it’s been nearly fifteen years. It’s a good feature and we aren’t going to find a name that no one hates. Just include it with whatever name.

Restricting the use of protected to the current assembly breaks basic OOP principles

OK, my first response is “huh?”

One of the core tenets of OOP is encapsulation. This generally means making a specific class a black box. There’s always been a balance between encapsulation and inheritance – inheritance breaks through the encapsulation on one boundary (API) while public use breaks through it on another.

Inheritance is a tool for reusing code. This requires refactoring code into different classes in the hierarchy and these classes must communicate internal details to each other. Within the assembly boundary, inheritance is a tool for reuse – to be altered whenever it’s convenient for the programmer.

The set of protected methods that are visible outside the assembly is a public API for the hierarchy. This exposed API cannot be changed.

The new scope – allowing something to be seen only by derived members within the same assembly – allows better use of this first style of sharing. To do this without the new scope requires making members internal; internal is more restrictive than protected. But marking members internal gives the false impression that it’s OK for other classes in the assembly to use them.

Far from breaking OOP, the new scope allows encapsulation of the public inheritance API away from the internal mechanics of code reuse convenience. It can be both clear to programmers and enforced that one set of members is present for programming convenience and another set for extension of class behavior.

Did No One Count?

This is embarrassing, although I can explain, really officer. I wasn’t drinking, it just looked that way.

I put up a Five Levels of Code Generation and it contained these bullet points:

  • Altering code for the purpose of machines
    The path from human readable source code to machine language
  • DSL (Domain Specific Language)
    Changing human instructions of intent, generally into human readable source code
  • File Code Generation
    Creating human readable source code files from small metadata, or sometimes, altering those files
  • Application code generation or architecture expression
    Creating entire systems from a significant body of metadata

See the problem?

Now, if you read the post, you might see what I intended. You might realize that I was in the left turn lane, realized I needed something at the drugstore on the right, didn’t realize the rental car needed to have its lights turned on (mine doesn’t) on a completely empty road at midnight in a not-great neighborhood in Dallas. Really, officer, I haven’t been drinking, I’m just an idiot.

There’s five because I make a split in the first item: That was partially because that post was inspired by confusion regarding what RyuJIT means to the future of .NET. (It actually means, and only means, that your programs will run better/faster in some scenarios).

The code you write becomes something, and then it becomes native code. That “something” for us has been IL, but might be a different representation. One reason for the distinction is that there are entirely separate teams that think about different kinds of problems working on compilers and native code generation. IL is created by a compiler that specializes in parsing, symbol resolution and widely applicable optimization. Native code is created in a manner specific to the machine where it will be run. In the big picture, this has been true since .NET was released and it’s a great design.

I think language compilation and native code creation are two distinct steps. One is all about capturing the expressive code you write, and the other is all about making a device work based on its individual operating system APIs.

But I might be wrong. I might be wrong because the increasing diversity in our environments means implications of native code API’s on the libraries you use (PCL). I might be wrong because languages like JavaScript don’t use IL (although minification is not entirely different). I might be wrong because it’s only the perspective of the coder that matters, and the coder rarely cares. I might be wrong because I’m too enamored with the amazing things like multi-core background JIT and MPGO (you can see more in the Changes to the .NET Framework module of my What’s New in .NET 4.5 Pluralsight course).

The taxonomy of code generation will shape the upcoming discussions Roslyn will inspire about metaprogramming. Metaprogramming encompasses only the DSL/expansion, file, and architecture levels.

You might be rolling your eyes like the officer handing me back my license in Dallas. Yes, officer. You’re absolutely right. If I titled the post “Five Levels of Code Generation” I should have had FIVE bullet points.

The Sixth Level of Code Generation

I wrote here about the five levels I see in code generation/meta-programming (pick your favorite overarching word for this fantastically complex space).

I missed one level in my earlier post. There are actually (at least) six levels. I missed the sixth because I was thinking vertically about the application – about the process of getting from an idea about a program all the way to a running program. But as a result I missed a really important level, because it is orthogonal.

Side note: I find it fascinating how our language affects our cognition. I think the primary reason I missed this orthogonal set is my use of the word “level” which implied a breakdown in the single dimension of creating the application.

Not only can we generate our application, we can generate the orthogonal supporting tools. This includes design-time deployment (NuGet, etc), runtime deployment, editor support (IntelliSense, classification, coloration, refactorings, etc.), unit tests and even support for code generation itself – although the last might feel a tad too much like a Mobius strip.

Unit tests are perhaps the most interesting. Code coverage is a good indicator of what you are not testing, absolutely. But code coverage does not indicate what you are testing and it certainly does not indicate that you are testing well. KLOC (lines of code) ratios of test code to real code are another indicator, but still a pretty poor one and still fail to use basic to use basic boundary condition understanding we’ve had for what, 50 years? And none of that leverages the information contained in unit tests to write better library code.

Here’s a fully unit tested library method (100% coverage) where I use TDD (I prefer TDD for libraries, and chaos for spiky stuff which I later painfully clean up and unit test):

public static string SubstringAfter(this string input, string delimiter)
{
var pos = input.IndexOf(delimiter, StringComparison.Ordinal);
if (pos < 0) return "";
return input.Substring(pos + 1 );
}



.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

There are two bugs in this code.



Imagine for a minute that I had not used today’s TDD, but had instead interacted – with say… a dialog box (for simplicity). And for fun imagine it also allowed easy entry of XML comments; this is a library after all.



Now, imagine that the dialog asked about the parameters. Since they are strings – what happens if they are null or empty, is whitespace legal, is there an expected RegEx pattern, and are there any maximum lengths – a few quick checkboxes. The dialog would have then requested some sample input and output values. Maybe it would even give a reminder to consider failure cases (a delimiter that isn’t found in the sample). The dialog then evaluates your sample input and complains about all the boundary conditions you overlooked that weren’t already covered in your constraints. In the case above, that the delimiter is not limited to a length of one and I didn’t initially test that.



Once the dialog has gathered the data you’re willing to divulge, it looks for all the tests it thinks you should have, and generates them if they don’t exist. Yep, this means you need to be very precise in naming and structure, but you wanted to do that anyway, right?



Not only is this very feasible (I did a spike with my son and a couple conference talks about eight years ago), but there’s also very interesting extensions in creating random sample data – at the least to avoid unexpected exceptions in side cases. Yes, it’s similar to PEX, and blending the two ideas would be awesome, but the difference is you’re direct up-front guidance on expectations about input and output.



The code I initially wrote for that simple library function is bad. It’s bad code. Bad coder, no cookies.



The first issue is just a simple, stupid bug that the dialog could have told me about in evaluating missing input/output pairs. The code returns wrong the wrong answer if the length of the delimiter is greater than one and I’d never restricted the length to one. While my unit tests had full code coverage, I didn’t test a delimiter greater than one and thus had a bug.



The second issue is both common, insidious, and easily caught by generated unit tests. What happens if the input string or delimiter is null? Not only can this be caught by unit tests, but it’s a straightforward refactoring to insert the code you want into the actual library method – assertion, exception, or automatic return (I want null returned for null). And just in case you’re not convinced yet, there’s also a fantastic opportunity for documentation – all that stuff in our imagined dialog belongs in your documentation. Eventually I believe the line between your library code, unit tests and documentation should be blurry and dynamic – so don’t get too stuck on that dialog concept (I hate it).



To straighten one possible misconception in the vision I’m drawing for you, I am passionately opposed to telling programmers the order in which they should do their tasks. If this dialog is only available before you start writing your method – forget it. Whether you do TDD or spike the library method, whether you make the decisions (filling in the imagined dialog) up front or are retrofitting concepts to legacy code, the same process works.



And that’s where Roslyn comes in. As I said, we abandoned the research on this eight years ago as increasing the surface area of what it takes to write an app and requiring too much work in a specific order (and other reasons). Roslyn changes the story because we can understand the declaration, XML comments, the library method, the unit test name and attributes, and the actual code in the method and unit test without doing our own parsing. This allows the evaluation to be done at any time.



That’s just one of the reasons I’m excited about Roslyn. My brain is not big enough to imagine all the ways that we’re going to change the meaning of programming in the next five years. Roslyn is a small, and for what it’s worth highly imperfect, cog in that process. But it’s the cog we’ve been waiting for.

Five Levels of Code Generation

NOTE 31 Jan, 2014: I discussed a sixth level in this post http://bit.ly/1ih3vL5.
NOTE 8 Feb 2014: I discussed why there are four, not five, bullet points in this post
http://bit.ly/1cf4Pcu


I want to clarify the five levels of code generation because there’s recently been some confusion on this point with the RyuJIT release, and because I want to reference it in another post I’m writing.


Code generation can refer to…


- Altering code for the purpose of machines
The path from human readable source code to machine language



- DSL (Domain Specific Language)
Changing human instructions of intent, generally into human readable source code


- File Code Generation
Creating human readable source code files from small metadata, or sometimes, altering those files


- Application code generation or architecture expression
Creating entire systems from a significant body of metadata


Did I leave anything out?


The increasing size and abstraction of each level means we work with it fundamentally differently.


We want to know as little as possible about the path from human readable to machine readable. Just make it better and don’t bother me. The step we think about here is the compiler, because we see it. The compiler creates MSIL, an intermediate language. There’s another step of gong from IL to native code, and there’s an amazing team at Microsoft that does that – it happens to be called the code gen team inside of the BCL/CLR team. That’s not what I mean when I say code generation.


The phrase Domain Specific Language means many things to many people. I’m simplifying it to a way to abstract sets of instructions. This happens close to the point of application development – closer than a language like C#. As such, there is a rather high probability of bugs in the expression of the DSL – the next step on the path to native code. Thus most DSLs express themselves in human readable languages.


File code generation is what you almost certainly mean when you say “code generation”. Give me some stuff and make a useful file from it. This is where tools like T4, Razor, and the Visual Studio Custom Tools feature are aimed. And that’s where my upcoming tool is aimed.


Architecture expression may be in its infancy, but I have no doubt it is what we will all be doing in ten years. There’s been an enormous logjam at the 3rd Generation Language phase (3GL) for some very understandable reasons. It’s an area where you can point to many failures. The problem is not in the expression – the problem is in the metadata. It’s not architecture expression unless you can switch architectures – replace what you’re currently doing with something else entirely. That requires a level of metadata understanding we don’t have. And architectures that better isolate the code that won’t fit into the metadata format, which we have and don’t use.


RyuJIT is totally and completely at the first level. It’s a better way to create native code on a 64 bit computer that means compiling your app to 64 bit should no longer condemn it to run slower than its 32 bit friends. That’s a big deal, particularly as we’re shoved into 64 bit because of side cases like security cryptography performance.


RyuJit is either the sixth or seventh different way your IL becomes native code. I bet you didn’t know that. It’s a huge credit the team to have integrated improved approaches and you don’t even need to know about them. (Although, if you have startup problems in non-ASP.NET applications, explore background-JIT and MPGO, as well as NGen for simple cases).


The confusion when RyuJIT was released was whether it replaced the Roslyn compilers. The simple answer is “no.” Shout it from the rooftops. Roslyn is not dead. But that’s a story for another day.

Roslyn

I believe project Roslyn is a watershed moment. It will start a complete redefinition of what programming and being a computer programmer means.


But why? How could rewriting the Visual Basic and C# compilers be even a blip in the history of computer programming? At best it’s a sneeze!


Show me a computer program. Right now. Really. At least think through how you would show me.


OK, you just opened up an instance of Visual Studio and showed me Solution Explorer graph of a whole bunch of files. Text.


A small minority of you might have used a different approach –showing me functional tests, a user interface, a database or a dependency graph. An even smaller minority might have shown me a UML diagram, DSL, or other code generation metadata. Six or seven of you might have even opened a decompilation tool or showed me some IL. Two or three of you might have discussed semantics. One of you, just maybe, opened a workflow designer (ah, the sad story of Workflow is for another day).


The line between these realities has been stark. You’re writing code or your ditzing around with the tools that support you writing code. Text. Text. Text.


A computer program has always been all of these things and something else. Something ethereal that existed between your ears and my ears. As programs have become more complex, we become increasingly desperate to live in the ecosystem of our application instead of in the code. That’s what agile was all about. Agile makes our application into an ecosystem of players. So does DSL. But in both cases we’re blind and playing with only parts of the elephant.


Wait, code matters. Of course code matters! It is one expression of abstraction along the pipeline from idea to electrons moving in silicon. Code is the last thing that normal humans will comprehend. It becomes IL that a platform specific compiler understands, and then some magic happens. That end works. With apologies for the analogy, we know how to take the elephants output and fertilize the garden. The apps work. We just feed the elephants badly because we do not understand them.


Oh, just forget that analogy, it was gross anyway.


What we need to understand is that no one today understands what an application is. We can’t. We can’t because the way we reason about anything is predetermined by the wiring of how we sense it. To understand what applications are in new ways, we need new ways to sense them, redefine what they are and place them in rich ecosystems with no rigid boundaries of definition between problem and solution. It’s easiest to visualize as no rigid boundaries between text and graphics, but that will feel like a very shallow distinction within five years.


While no one understands what an application is today, no one can stop us coming to a new understanding and a new relationship in the next five years. Roslyn is not birthing in a vacuum. We have relatively mature ecosystems in DSL and functional languages. We have an emerging (about to be exploding) work happening in composition and semantics. We have ongoing research in algorithms. We have a lot of work in the areas of agile, quality, reasoning/diagraming, requirements, human interfaces, etc. We are ready.


Roslyn, for us in the .NET world, is the snowball tossed to start the avalanche of change. It will do this because it will lead us to think of our code as a semantic tree. Such a tiny, tiny thing. And then we’ll have the most amazing array of IDE tools, and then new DSL/generation tools, and then amazing new composition reasoning, and then…, and then… , and then… , and then… , and then… Each will lead us to new thinking about the next and around the tenth or hundredth “and then” it will be obvious how seismic the shift.