Rethinking “Nearly VB”

Bill McCarthy added a comment to my blog which I wanted to answer:


 


So why not use VB for the templates but C# for the initial output rather than some “Nearly VB” . Doesn’t C# address every issue you’ve raised ?

But I am curious as to what about issues that are language specific, such as declarative event wiring, optional parameters etc ?



C# fixes the majority of the issues I raised, except ambiguity in closing brackets. If you assume that the closing of a structure will always be at the same level and outside embedded expressions such that you maintain symmetry in relation to the evaluation stack, you can resolve the closing brackets. Retaining symmetry in closing brackets means that the following will work:

  

   Return _

   <code>

      if (x == 1)

      {

         <%= MoreStuffFunction() %>

      }

   </code>.Value

Any variation of the following will not work:

   Return _

   <code>

      if (x == 1)

      {

         <%= MoreStuffFunction() %>

         <%= “}” %>

   </code>.Value

Which means among other things you cannot do:

   Return _

   <code>

      if (x == 1)

      { 

          <%= StuffFunction() %>

         <%= If(z, “}”, MoreStuffFunction() & “}”) %>

    </code>.Value

But you can rearrange it to:

   Return _

   <code>

      if (x == 1)

      {

      <%= StuffFunction() %>

      <%= If(z, String.Empty, MoreStuffFunction()) %>

      }

   </code>.Value

Bill believes this symmetry restriction is less onerous than the restrictions I placed on VB, especially the open/close parentheses on method calls. Another significant value to the C# first approach is that it’s much easier to recognize equals comparisons in assignment statements, and some of the null comparison problems I’m currently ignoring will be lessened because C# does not allow certain comparisons with nullable that VB allows.

While Bill convinced me that the C# first template was not nearly as difficult as I imagined it, by convincing me the restriction on the location of the close brackets in symmetry with the open was reasonable. However, he didn’t convince me to change my current work. VB first is the best scenario for my current client and I think if we have the possibility we should try to supply both so people can write and maintain their templates with the output code that they prefer, and prefer to debug the first version of the output in. Hopefully this can happen, but the most important thing to me at the moment is getting a working version out to you to play with – I don’t want to derail that with a second template converter/preprocessor. If someone else wants to work on that…J let me know.

Template Languages and "Nearly VB"

The templates I’ve been talking about require very specific language features of the VB compiler and language neutral templates do not allow any ambiguity in the code output in the initial template.


The template itself must be in VB because it’s required for embedded XML – the code blocks. The code blocks are essential for understanding which code to translate when creating an alternate language template in a pre-processor. Code in strings would be impossible (or nearly so) to translate at the template level and translating at the output level would have many issues including debugging and performance. There are tools available that translate normal source code, and you could do that, but I’m not sure why. It’s a lot of extra variables, when translating the template offers faster performance and more reproducible results. Sticking with template translation – the code block clearly indicates to the template preprocessor where to switch into translation mode.

The language output by the initial template must be VB, or “nearly VB.” Even if your primary interest is C#, a language neutral solution requires that the initial template have no ambiguity. Sticking with familiar and well supported languages is helpful because the initial output can be tested in VB, isolating problems in the template from any in the template translator/precompiler. This requires a non-ambiguous language I’ll call “Nearly VB”. If you’re strictly interested in C#, and have no interest in language neutrality, you can, of course, use VB’s XML literal code blocks to directly output C# code.

Ambiguity breaks the ability to build language neutral templates because the preprocessor has very little idea of the current context. It cannot understand whether a particular close curly bracket is an End If, a Next, an End Get or something else. Unfortunately, Visual Basic is not totally ambiguity free either, which forces the concept of “Nearly VB” rather than just normal VB. Nearly VB has one syntax change and a couple of extra rules when compared to VB.

VB is ambiguous on parentheses. It uses parentheses to include both method parameters and indices. VB is also ambiguous when it comes to case. To solve this in templates, use square brackets to indicate indices and parentheses for normal method calls. The C# compiler will help you find the problems when your C# output files fail to compile. The VB output can easily replace the square brackets with parentheses when outputting VB files.

At the moment I’m not convinced that the other meaning of square brackets – allowing identifiers to match keywords – need to be supported very well. There aren’t that many keywords and simply avoiding them seems an easier solution. You can support them if you escape the character via the \x20 escape pattern and the ASCII character (/x28). OK, that’s not very pretty, so a shorter escape sequence may make sense if people run into this very often.

Case insensitive is really another way to say “case ambiguous”. Language neutral templates require that you correctly case all symbols, the preprocessor can manage the keywords it’s translating, but you’ve got to get the symbols correct. Consider a Symbols class with constants, which also provides Intellisense while you’re creating your templates.

VB is sloppy in not forcing you to include the open/close brackets after a call to a method that does not have parameters. In a broader perspective this is ambiguous because in C# the presence or absence of the parentheses indicate whether you want to call the method or grab the delegate. While that particular ambiguity is resolvable because VB would require the AddressOf operator (or a lambda expression), I’m not tracking symbols. So I don’t know whether your symbol is a method, variable, or property. Thus, I don’t know whether the parentheses is needed. For language neutral templates, you add the parentheses on all method calls.

NOTE: I actually explored whether this problem is solvable, and I believe it is not. I don’t think it’s that much to ask you to include the parentheses correctly – it’s just a place we VB coders have historically been lazy.

So, to allow language neutral templates:

  • Use basic VB syntax
  • Use square brackets instead of parentheses for indexes
  • Maintain consistent case for all symbols
  • Include open/close parentheses for all method calls
  • Avoid keywords as symbol names or escape the surrounding brackets with the XML escape sequence
  • Rather obviously, avoid features unique to VB

I’ll do another post in a few weeks on issues around spots the two languages inherently work differently. There will be more items on this list, particularly around the management of nulls in relation to operators.

I do not dream that I’ve covered everything. The only way to ensure language neutral templates is to create them, ensure the code is are syntax correct, compile and run in VB and then create the similar code in C# and make sure you valid syntax, clean compile, and can run the finished applications. After the upcoming preprocessor has been out for a while we will have a better idea how you can break it and chink the holes where you can. But issues that involve ambiguity will have to be solved by the template author.

Validation Information in Metadata

Mike asks:

Just curious if your metadata also contains validation rules or not?  Things like property is required or range of valid values.

It could include them in three possible ways – it currently uses one and I’ve had two others working in the past that may be resurrected.

The metadata that the database inherently knows is automatically transferred – this would be nulls and string length. How well nulls are handled is up to the architecture, but the metadata definitely knows what’s nullable.

I’ve experimented with two additional approaches: using extended properties and parsing the TSQL of check constraints. The first would work for simple ranges and other predictable data sets, but it puts information in an unexpected place. I currently can’t justify it over placing validation in known places in the handcrafted code.

Using check constraints leverages existing information so is a “good” thing. Unfortunately, no one ever seemed to care about the months of work I put into that five years ago so I let it stagnate. Since I know more now, I could resurrect that work, but honestly I don’t think I’ll get time soon.

The problem is that most people just don’t put check constraints in the database very often. I find that unfortunate for many reasons, but it becomes a chicken and the egg problem. People don’t put the constraints in the database because they’ll have to restate them in the business layer for decent usability. This initiative doesn’t get attention to solve that problem, because the check constraints aren’t already there. Perhaps the time is ripe now. I would love to include check constraint based validation in the Open Source version that we plan to start up on Code Plex this week or next (public within thirty days after) – at least a framework for it.

Check constraints are closely related to defaults because both require parsing TSQL. Turns out, over the years folks have been primarily interested in defaults of “now”, new guids, and raw values. Today or Now are pretty easy because it’s just a straight up translation between a SQL function and a .NET function. Any straight up translations like that can be defined in sort of a metametadata (hate that phrase) layer. I handle all three of these scenarios in my metadata extraction tool (a metadata extraction tool will be part of the CodePlex project).

I think validation should be stated in the business layer in rules. I wasn’t doing this five years ago so the whole process of incorporating validation from check constraints will be vastly simpler. Instead of code to code, you need to recognize a category – such as a bound range (the most important) and parse out the bounds into a structure usability by a specific rule. Then another rule is “there’s a check constraint and I think you need to validate based on it, but I can’t write it so you need to.” The architecture could enforce some code being written in response to that rule. To state the change from five years ago, the metadata wouldn’t contain code but the statement of which rule and its parameters.

Validation in triggers would seem, at least to my weak TSQL mind, to be exceedingly difficult.

So, the basic answer to Mike’s question is “some, but not all of the really important scenarios are covered, and I don’t think you’ll ever cover all scenarios”

Template PreProcessor

I better say it up front, because it will quickly become obvious. I am not a computer science graduate. I have never written a compiler. It was quite a route to get my thinking in line with this particular problem and I’m sure it will evolve further.

As I said in my post on Friday, one part of my solving the VB/C# problem without making unreadable templates is a preprocessor. I struggled with what to call it because the real pattern is – create the VB templates, run the processor to create the C# templates, execute the C# or VB templates. So is it really a pre processor? I am still calling it that as it is before running the C# templates so I ‘m thinking of it as an optional pre-processor.

The result is modified templates. A second set of source code and a second template assembly.

The first decision I faced was how much context I was going to demand for any decision. More context, more sophisticated decisions. You could attempt to build a full syntactic tool that understands the structure of your output code and knows a great deal about what you are accomplishing. This may or may not be possible, and will certainly require restrictions on what template code is legal because evaluating multiple paths will be a nightmare and stray strings can result in legal templates, but won’t provide the same evaluation. You may be able to; I’m choosing not to tackle that and decided on least possible context.

The absolute minimum of understanding about the template being converted is which of a finite set of states you are in. Possible states are:

  • Template logic (the code that runs the template)
  • Comments
  • Code blocks
  • Expressions
  • Conditional blocks within code blocks
  • Possibly additional states around declarations, for loops and using statements

My first attempt was line based. Faster, easier to recognize comments and nearly impossible to ever restructure the line wrapping correctly. Trust me, that route did not go well.

A week ago Friday, nearly in tears, I told my son Ben “Look, I told everybody I could do this, and Carl just posted that show. And I am doing it, except I think the bugs I am facing with end of line issues are not solvable.”

My brilliant son said “Why on earth are you doing it that way – do character by character.”

“What, rewrite the whole thing?” maybe I cried.

The rewrite actually went pretty well, painful as it was to abandon nearly completed code. It was made easier by the fact I really do not care about performance. This is a template translation. The converted templates will be compiled and blazingly fast. I can take a second or two a template to do the translation. Thus I can skip all that compiler theory that I never learned about managing buffers and look aheads and all that. A bit of brute force with the simplest possible RegEx.

I’m basically looking at the entire template as a string. I step character by character through the string doing a substring check starting at the current position. I avoid the dumbest of the .NET mistakes such as copying the substrings unnecessarily and I do concatenate via a string builder so performance doesn’t suck too badly. And I do restrict what I’m looking for to what makes sense in context. But I don’t worry that I am looking at the next handful of characters an excessive number of times.

I start off in the template logic. I output the template logic character by character until I find a character sequence that indicates a new mode. I’m keeping this simple by managing both the modes and the required stack via the call stack. Meaning, when I shift into a new mode, such as the Comment mode I just call a method called TranslateComment. Comments are easy – just change the start character and read to the end of the line for output. I need comments treated differently because a code block in a comment should not be translated.

For now, I’m making the restriction that code blocks – blocks to output – must be exactly <code>stuff</code>. This makes parsing a bit easier than allowing any element name. If I’m in template logic and hit a code block, I know I need to start translating. I start looking for sequences that need conversion Me as a word, If, For Each, End If, Next, etc. This list is pretty short right now, I expect the preprocessor to evolve.

If I’m in a code block and I hit an embedded expression (<%= ) I switch back to template logic mode. This is not precise but its close enough. Characters are output exactly until I hit another code block because this is template logic, not output code. If you concatenate strings in there, you’re toast, but you can call methods that are in the VB/C# namespaces.

There are some special cases around code constructs. I recognize an If block by searching for the Then and taking what’s between as a code expression that needs translation. Wherever I’m translating expressions I just use a simple replacement because it’s really just separate symbols.

The preprocessor is simple and focused on what’s actually needed, not boiling the ocean. It will evolve as far as it needs to, staying well shy of both the power and usability issues of the CodeDOM – we just don’t need that for business templates in VB and C#.

Whew! I could write tons more on glitch little details of this preprocessor that’s really eaten my last couple of weeks. It’s one of the pieces I want to get Open Source early on.

Catching up on Blogs – Conceptual Space

I’ve been catching up on blogs and ran across this from Zlatko from Dec. 14.

His basic point is that EF is more than an OR/M mapper because it works in a conceptual space between the object layer and the database – creating a third layer.

I’m very happy that Zlatko said this. It articulates something I’ve never articulated well. The metadata is not a representation of the object layer – it is a way of thinking described in metadata that can be thought of as entities, or abstractions, or something else rather vague and fluffy – see I have problems explaining it.

Entity Framework does pins down this previously mind based abstraction. It’s a subtle shift but it exposes how we think about objects, and now gives us a word for it – the conceptual model. It takes what was previously a mind cloud that we shared by implication from metadata definitions and makes it something we can visual in a drawing.

Unless I’m entirely missing the point though, I do not buy that the existence of this layer is new. I think most or all of us that do metadata based code generation have been doing this for years.

But it is not trivial and it is important to articulate and create a visualizer for something that we’ve just been doing between our ears by implication. It’s part of what makes the implications of EF for metadata for all code generation significant.

The EF conceptual and metadata layers are important even though its current incarnation comes up a bit short in richness and in ease of access. We can fix both these with some effort – I’m loving the moment in time we’re living in and just wishing I had twice as much time to work each day.

I Hate it When I Learn from Dilbert

Do we all live in fear of that moment when we notice that we’re the one on the other side of Dilbert? When Dilbert is wise and well, we’re not.

Two weeks ago I was writing a long paper explaining some nuances about the state of the templates at that time and asking my client not to reject it until he had looked into it and really understood it. So, in the next morning’s Dilbert strip someone comes to Dilbert and says “I’ll tell you my idea if you promise not to reject it until thinking about it” and Dilbert says “I already rejected it because only putrid ideas come with warnings”

So I spend the better part of the weekend rationalizing that my idea really doesn’t fall into that category.

And then I spring out of bed at 6AM Monday morning (I sort of wish that part was a joke) with the solution. So, let’s look at the problem today and the solution in the next post:

Yesterday’s code was:

 

Private Function MemberGetPrimaryKey() As String
      Return OutputFunction( _
                           Symbols.Method.GetPrimaryKey, _
                            Scope.Protected, _
                            MemberModifiers.Overrides, _
                            ObjectData.PrimaryKey.NetType, _
Function() _
<code>
   Return m<%= ObjectData.PrimaryKeys(0).Name %>
</code>.Value)
End Function

 

It’s easy enough to ditch the Return statement with a constant. I put these constants in a class and imported the class, which allowed me to directly access the constant, although it was in a different file:

 

Private Function MemberGetPrimaryKey() As String
      Return OutputFunction( _
                           Symbols.Method.GetPrimaryKey, _
                            Scope.Protected, _
                            MemberModifiers.Overrides, _
                            ObjectData.PrimaryKey.NetType, _
Function() _
<code>
   <%= returnString %> m<%= ObjectData.PrimaryKeys(0).Name %>
</code>.Value)
End Function

 

How bad is that?

But as Bill pointed out in comments on the last post, if all I’m doing is returning a value, I don’t need the code block at all and can teach the OutputFunction method to do the job. So, switching to a more complex and common example, and remembering that I’m out to solve the C#/VB single template problem to allow a single template for any architecture, I took these concept a few steps further. The result of a more complex method becomes:

 

  Private Function MemberPropertyAccessSet(ByVal propertyData As IPropertyData) As String
         Return _
<code>
         CanWriteProperty(“<%= propertyData.Name %>”, true)
         <%= OutputConditional(“m” & propertyData.Name & ” <> value”, _
            Function() _
            <code>
            m<%= propertyData.Name %> = value
            PropertyHasChanged(“<%= propertyData.Name %>”)
            </code>.Value) %>
</code>.Value
      End If
   End Function

 

Which for a single language is the same as merely doing:

 

 Private Function MemberPropertyAccessSet(ByVal propertyData As IPropertyData) As String
         Return _
<code>
         CanWriteProperty(“<%= propertyData.Name %>”, true)
         If m<%= propertyData.Name %> &lt;<> value
            m<%= propertyData.Name %> = value
            PropertyHasChanged(“<%= propertyData.Name %>”)
        End If
</code>.Value
End Function

 

Which would you rather debug? Imagine debugging the templates for an even more complex routine.

This spawned my Dilbert moment. If I have to convince you of the wisdom of this, then maybe it’s not so wise. So, what I jumped out of bed to do was build a preprocessor – converting templates that output VB into templates that output C#. By removing that technical restriction, we can find the sweet spot between reducing typos and obfuscating the code logic. I think that is about where the first and last code fragments in this post are. What do you think?

Avoiding Typos via Output Methods

One of the issues with the code generation templates is that they do not test the syntax of the output as you type. I’m a VB coder, and that would be my fantasy, an editor that told me whether my templates produced valid output as I type. That’s nearly impossible to do, so don’t hold your breath.

In the meantime, you may have code like the following where

 

Private Function MemberGetPrimaryKey2() As String
      Return _
<code>
   Protected Overrides Function GetPrimaryKey() as <%= ObjectData.PrimaryKey.NetType %>
      Return m<%= ObjectData.PrimaryKeys(0).Name %>
   End Function
</code>.Value
End Function

 

Any typos between the <code> elements will result in dozens or hundreds of compiler errors when the output code is incorporated in your project. This is a pain in the neck to deal with, so anything we can do to have less typos is desirable.

When you create a UI for your users, you limit the number of mistakes the user can make via techniques like combo boxes. We can take advantage of Visual Studio’s editor to do a similar thing.

Your output code has logic within subroutines, functions and properties. While this code is trivial in the example above – just a return statement – your code will generally involve more complex logic. It’s important that you see this logic to evaluate it as you’re maintaining templates. The actual function declaration however, is not logic.

I created methods to output the enclosing declarations, as well as other non-logic based structures. This transforms the code above into:

 

Private Function MemberGetPrimaryKey() As String
      Return OutputFunction( _
                           “GetPrimaryKey”, _
                            Scope.Protected, _
                            MemberModifiers.Overrides, _
                            ObjectData.PrimaryKey.NetType, _
Function() _
<code>
   Return m<%= ObjectData.PrimaryKeys(0).Name %>
</code>.Value)
End Function

 

I’ve spread this out for clarity.

You can still typo the name of the method and the word Return. While you can make a bad selection, you cannot make a typo in anything else. And the parameters of the output function remind you of the types of modifiers that make sense. Under the covers, the OutputFunction creates a FunctionInfo object. While I’m not using it here, the OutputFunction method accepts a paramarray of ParmaeterInfo objects if you’re function needs parameters. Again, you can typo symbol names, but nothing else. Of course since the OutputFunction is within the code active in the IDE, you get full Intellisense, information blocks, background compilation and all the good stuff.

I’m using a lambda expression. In this case, it creates an in line delegate used to output the function body. If this template method becomes unduly complex, you could also use VB’s AddressOf operator to call a separate method as a delegate. In this case, the delegate signature I expect has no parameters and returns a string. Since the <code>…</code>.Value returns a string, it’s an effective delegate.

The FunctionInfo object includes an attribute collection. Thus, any attribute you desire to place on the function can be assigned by explicitly instantiating the function info object, rather than using the helper function.

This is quite similar to the OutputClass and OutputRegion methods I’ve showed earlier, but it takes the idea of using explicit method calls in the template to reduce the opportunity for typos in the output.

Output symbol typos are a problem, and you can avoid this through an enum or constants. You’ll use some of these constants across many templates and there will be a lot of them across your templates so I’d suggest you keep things clean by creating classes that contain your symbols. I created a namespace called “Symbols” and classes for Type, Method, Interface, etc. This gives nice clean Intellisense and makes it easier to find symbols in the constant list. Thus the code above becomes:

 

Private Function MemberGetPrimaryKey() As String
      Return OutputFunction( _
                           Symbols.Method.GetPrimaryKey, _
                            Scope.Protected, _
                            MemberModifiers.Overrides, _
                            ObjectData.PrimaryKey.NetType, _
Function() _
<code>
   Return m<%= ObjectData.PrimaryKeys(0).Name %>
</code>.Value)
End Function

 

That leaves “Return” as the only remaining opportunity for a typo – which is the subject of tomorrow’s post.

Comments Fixed

For someone that writes software for a living, I have a remarkably hard time using it. I would not have expected “Filter: Ignore” to display no new comments. Ignoring a filter would be more like showing everything.


Happily I have friends that are as patient as I am confused. Thanks to Bill McCarthy and Susan Bradley (who reset my password which was lost in the bowels of my system and I wanted to switch to Live Writer) my blog is slightly more functional.


My apologies to the folks that wrote comments that I seemingly ignored for the last several weeks. They should be fixed, and please let me know if you have difficulties.


I’m still approving anonymous comments so that will sometimes take a while. Non-anonymous comments go live immediately. Unless I start getting spammed too badly.

Open Source

It’s occurred to me that if you are following this and my DNR TV show a logical reaction would be “OK, so that’s a lot of hot air, where do I get it?” I intend for all of this to be released Open Source, on whichever site s hot when I release it. I hope I’ll start releasing pieces in just a matter of weeks. It will help a lot if it becomes “we” instead of “me”. So, if you’re interested n this stuff and you want to help, let me know and you can get this stuff directly as its rapidly evolving too much to publically post right now.

My current expectation of the order of release:

  • Metadata interfaces
  • GenDotNet metadata load
  • EDMX metadata load
  • Template infrastructure – what’s needed for the language neutral templates
  • Simple harness
  • Activity based harness

If you’re interested, you can contact me directly at Kathleen@mvps.org