Template PreProcessor

I better say it up front, because it will quickly become obvious. I am not a computer science graduate. I have never written a compiler. It was quite a route to get my thinking in line with this particular problem and I’m sure it will evolve further.

As I said in my post on Friday, one part of my solving the VB/C# problem without making unreadable templates is a preprocessor. I struggled with what to call it because the real pattern is – create the VB templates, run the processor to create the C# templates, execute the C# or VB templates. So is it really a pre processor? I am still calling it that as it is before running the C# templates so I ‘m thinking of it as an optional pre-processor.

The result is modified templates. A second set of source code and a second template assembly.

The first decision I faced was how much context I was going to demand for any decision. More context, more sophisticated decisions. You could attempt to build a full syntactic tool that understands the structure of your output code and knows a great deal about what you are accomplishing. This may or may not be possible, and will certainly require restrictions on what template code is legal because evaluating multiple paths will be a nightmare and stray strings can result in legal templates, but won’t provide the same evaluation. You may be able to; I’m choosing not to tackle that and decided on least possible context.

The absolute minimum of understanding about the template being converted is which of a finite set of states you are in. Possible states are:

  • Template logic (the code that runs the template)
  • Comments
  • Code blocks
  • Expressions
  • Conditional blocks within code blocks
  • Possibly additional states around declarations, for loops and using statements

My first attempt was line based. Faster, easier to recognize comments and nearly impossible to ever restructure the line wrapping correctly. Trust me, that route did not go well.

A week ago Friday, nearly in tears, I told my son Ben “Look, I told everybody I could do this, and Carl just posted that show. And I am doing it, except I think the bugs I am facing with end of line issues are not solvable.”

My brilliant son said “Why on earth are you doing it that way – do character by character.”

“What, rewrite the whole thing?” maybe I cried.

The rewrite actually went pretty well, painful as it was to abandon nearly completed code. It was made easier by the fact I really do not care about performance. This is a template translation. The converted templates will be compiled and blazingly fast. I can take a second or two a template to do the translation. Thus I can skip all that compiler theory that I never learned about managing buffers and look aheads and all that. A bit of brute force with the simplest possible RegEx.

I’m basically looking at the entire template as a string. I step character by character through the string doing a substring check starting at the current position. I avoid the dumbest of the .NET mistakes such as copying the substrings unnecessarily and I do concatenate via a string builder so performance doesn’t suck too badly. And I do restrict what I’m looking for to what makes sense in context. But I don’t worry that I am looking at the next handful of characters an excessive number of times.

I start off in the template logic. I output the template logic character by character until I find a character sequence that indicates a new mode. I’m keeping this simple by managing both the modes and the required stack via the call stack. Meaning, when I shift into a new mode, such as the Comment mode I just call a method called TranslateComment. Comments are easy – just change the start character and read to the end of the line for output. I need comments treated differently because a code block in a comment should not be translated.

For now, I’m making the restriction that code blocks – blocks to output – must be exactly <code>stuff</code>. This makes parsing a bit easier than allowing any element name. If I’m in template logic and hit a code block, I know I need to start translating. I start looking for sequences that need conversion Me as a word, If, For Each, End If, Next, etc. This list is pretty short right now, I expect the preprocessor to evolve.

If I’m in a code block and I hit an embedded expression (<%= ) I switch back to template logic mode. This is not precise but its close enough. Characters are output exactly until I hit another code block because this is template logic, not output code. If you concatenate strings in there, you’re toast, but you can call methods that are in the VB/C# namespaces.

There are some special cases around code constructs. I recognize an If block by searching for the Then and taking what’s between as a code expression that needs translation. Wherever I’m translating expressions I just use a simple replacement because it’s really just separate symbols.

The preprocessor is simple and focused on what’s actually needed, not boiling the ocean. It will evolve as far as it needs to, staying well shy of both the power and usability issues of the CodeDOM – we just don’t need that for business templates in VB and C#.

Whew! I could write tons more on glitch little details of this preprocessor that’s really eaten my last couple of weeks. It’s one of the pieces I want to get Open Source early on.

2 thoughts on “Template PreProcessor”

  1. Kathleen,
    Maybe you are trying to solve a problem (automatic template conversion) that doesn’t really need to be solved.
    This statement is based upon the following premises and understanding:
    1) The text inside the XML literals can be anything (e.g. VB.Net or C# style syntax)
    2) The utility functions such as OutputFunction that output VB.Net code live in one namespace and the version that outputs C# code is in another namespace. Only one namespace is referenced by a given template.
    3) Most business application developers would want to end up with either a set of VB.Net or C# output classes, but not both.
    4) If #3 is not the case for some people, there are excellent tools for under $200 that will reliably convert this type of code between VB.Net and C# (especially if you can adjust the going in template to output code that is readily convertible).

    I can think of a few times where it may offer value to convert a template, but that should be a point in time and not ongoing.
    1) When I download a cool template that does not output the syntax I need, a converter would give me head start on getting what I want.
    2) If my employer is switching from one language to the other due to fad or resource availability, then I would want to leverage the investment.
    3) #2 raises the possibility of a consultant wanting to maintain one set of templates and be able to convert to the other syntax for a particular customer.

    Now there certainly is the COOL factor that comes into play. However from a potential user of the GREAT work you are doing here, I would recommend you put any additional work for the template converter story on the backlog. This is exactly what your closing statement about pushing it to open source says to me. Let those who really need it, invest in improving it.

  2. Perhaps its worht a separate blog post, but the largest benefit of language neutral templates is allowing you to use the templates of any avaialble architcture. Currently a set of templates outputs in one language, not both. When the CodePlex project we’re starting goes live, we expect it will have a single set of CSLA 3.5 templates. The folks doing that can build it and fine tune it exactly once.

    I want expressing architectural detals in code to be independent from the language its expressed in.

Leave a Reply

Your email address will not be published. Required fields are marked *