Schemas in the scheme of things…

I got an interesting email today, which, to paraphrase, said:

I would like a syntax where I can say “an XElement that follows this XSD”

This question actually digs deep into the issue of strong typing versus dynamic typing and what it really is we want from a type system.

A lot of people think of strong typing as a forced constraint that ensures that types used are are of a particular type and that this constraint can then be used to facilitate design time checking and the intellisense experience. Although this is true, people often forget that the strong typing doesn’t eliminate type errors, it just limits them. For example, whenever you do a narrowing cast you risk a runtime error not a design time one.  In .NET the need for narrowing is pretty common: one only needs to think of the standard event signatures where sender is always As Object. So what strong typing does is “shift” some of the errors.  When it comes to XML the concept of shifting the validation of the object becomes even more important

To address the problem of strong typing with an XElement, you could use inheritance but this raises a lot of issues. Factory methods such as Load return an XElement, so you’d have to provide your own Load methods. In the end it’d probably be easier to have your object model not inherit from XElement, rather just use XElement as the internal container.  This would give you the ability to write POCO (plain old CLR Objects) classes for your data, but in doings so you face a couple of serious design issues.  If you choose to validate the data at load time, that can cause excessive delays in loading the XML especially if parts of it are not used. If you don’t validate at load, your object model can actually have an invalid state and do unpredictable things such as throw exceptions when a property is accessed.  And there’s also the issue that XML can contain more data than the schema dictates. Do you loose that data or preserve it ? If you preserve it, then how do you provide access to it while also preserving the other data integrity ?

As you can probably see, there’s a mismatch here between POCO and XML

If the goal is to validate the XML, you can do that with XDocument by providing a XMLReader with the XMLReaderSettings set to validate against a schema. Alternatively you can let the exceptions occur when a member is accessed. So XElement provides a flexible way of validating either when loaded or when accessed. But what’s missing still is the design time experience…… 

So how can we get a great design time experience with an XElement akin to strong typing but without all the drawbacks strong typing and POCO present ?  I think the answer is in dynamic interfaces that are XElements, not POCO.  For example let’s say I could write a:

Public XInterface IPerson
    <Name> As String
    <Age> As Int32
    @ID As GUID
End XInterface

You could then define an XElement parameter or variable as IPerson, and get a great design time experience.  You’d still have to decide if you wanted to validate your XML when you load it, or just wait and see, but really that’s a great performance flexibility 🙂

The XInterface above is a simple model where you define the values in term of CLR types, so when accessed the appropriate conversions will be done from you from XML to that type. For more complex models it’s be nice if you could also define a XInterface from a schema, and the compiler would infer the appropriate CLR types from the XML types. In this case you’d have to provide the schema to define the interface.

Imports XElement IPerson = “myPersonSchema.xsd”

Where “mypersonSchema.xsd” would be the name of a file in the project or a file path or URL/URI.

With dynamic interfaces that are based around XElement, you could provide a really great rich design time experience for XML.