Language as Two Parts: Parser and Behavior

I wrote here about Roslyn as a Black Box. There’s another layer to this that can be remarkably subtle.

This isn’t actually the complete view.

RoslynWithManyInterfaces

One thing that is missing is that the inherent nature of interfaces is anything can be on the other side.

“Of course!” you say. “Anything can be at the top and bottom of your picture”.

That’s true. But anything can also be in the middle.

RoslynWithMultipleCompilers

There’s no reason a composed compiler system can’t support multiple compilers, and variations. MSBuild scripts need a little tweaking, of course.

There are actually already two compilers – for C# and Visual Basic. They differ in a number of small, relatively subtle ways – such as overload resolution. There are also language concepts that do not align, like VB XML literals and null propagation for XML.

Computer Languages as Two Things

Roslyn clarifies that computer languages are made of two parts. Languages relate intimately to a parser – this determines the syntax. Languages are also defined by runtime behavior. There are thousands of aspects of behavior that are determined with excruciating detail by the way the compiler creates IL (Intermediate Language).

In the diagram above, this means the yellow top layer could be different from the black box.

To understand this, imagine using the Visual Basic compiler to build a tree for the C# language compiler. If all features were supported, you would create a functioning application. But, it would not behave the same on several of these thousand little details. For one thing, it would not have the same overload resolution in all cases.

This is why the dream of one core language with a VB/C# dialect displayed per the user’s preference is an elusive goal. It would result in a third language. VB expressed as C# would not have the same behavior.

The same thing is true for incorporating other strongly typed OO languages – like Java. If the concepts can be mapped to the C# Roslyn API, then C# code can be created. It can be created either as IL, or as human readable code.

Whether cross language compilation is useful depends on expectations. I suspect a straight up conversion, after non-aligned concepts were removed, would be only a little worse than an expensive human programmer. It might be better than a programmer new to the target language.

Common Core Features

One of the interesting things to me about the introduction of Roslyn is that we can get a better understating of core language sets (hint if you are looking for an MS or PhD thesis). Obviously VB and C# are very similar and there is a better body of knowledge on their differences in the post-Roslyn world.

But what about Java? What would be the subset of the C# or VB tree that Java could support? How much of the language would be left out? For example, would structured trivia support Java annotations?

And what about JavaScript? Ok. I know it’s not strongly typed, so it’s a big leap, but I doubt we’d actually go to JavaScript when TypeScript already makes that part of the leap. Could C# go to TypeScript and then to JavaScript. The immediate payback is a new way to share blocks between C#/VB and JavaScript.

And what about Haskell? OK. I know that’s a really big leap. Haskell requires incredible type inference, lazy loading, duck typing and other features beyond the current iterations of C#/VB. As I understand it, and I’ve never had even a temporary tattoo of F#, Haskell is better at strict purity and the guarantee of mathematical reasoning. Or maybe we just need F# at the party.

I don’t believe we need to do these things across our entire applications. Sharing UI code with JavaScript might make sense, but doing mathematical reasoning across a user interface doesn’t make sense to me.

We have the PCL to create components whose IL is transferrable across different platforms (.NET, Windows Phone, XBox). It seems possible and maybe interesting to build a similar subset/transferrable component at the language level.

I don’t know whether there are other things we can do with the separation of parsing, compiling and emitting. But the exciting thing about Roslyn, and its OSS release, is that we can explore.

I’m trained as a scientist, a crystallographer. One of the reasons I’m so excited about Roslyn is that opening up easy access to information leads to interesting research. I’m sharing this vision, as a little part of a broader vision we can build together, and hoping that helps academics and others figure out interesting things to work on.

There are brick walls down some avenues. Research isn’t research if it might not fail. But it’s going to be a heck of a ride. I can hardly wait to see what the next decade brings.

2 thoughts on “Language as Two Parts: Parser and Behavior”

  1. I would like to see the .NET type system become the central basis for a massive code generation platform. Among other things, it would allow one to describe APIs and generate web service client and server stubs, client SDKs, test code, perhaps even sample code for web service clients. And, of course, it would allow multiple programming languages to be used for both client and server coding.

    Or, to put it another way, what is Microsoft Azure going to provide us in the way of tools to describe APIs and do all these things?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>