From Object Model to Open XML SDK Coding

No question: I’m object model-oriented. It didn’t take me long to recognize the advantages of object-oriented programming over Word Perfect keyboard macros and the user action emulation of WordBasic when VBA was introduced in the mid-90s!

After ten years of Visual Basic I was confronted during the Office 2003 beta with .NET and C# in the context of VSTO and XML as a file format. Dealing with all of this simultaneously was a challenge. With the advent of Office 2007 I migrated more towards VSTO and worked with XML only sporadically. The new file formats made sense and I understood what they were, but they weren’t really on my radar as something I wanted to do.

So I’m a bit late coming to the party and, what’s more, as with VSTO I’m coming to the technology through “the back door”. My knowledge of LINQ was practically non-existant until a year ago, when I bought a book. But it’s still mostly theory and little real understanding, although things are progressing.

While moderating the Open XML SDK forum, however, I find my knowledge of how Word is designed, its object model and my basic understanding of the underlying WordOpenXML are perhaps even more important than being able to code fluently using the Open XML SDK. While it’s possible to use many .NET namespaces, such as for Windows Forms, without a deeper understanding of what’s going on “under the covers”, the same can’t be said for doing anything complex with Office applications or files. It’s necessary to understand how the individual Office applications function in order to be able to manipulate them, as well as their files.

One thing that often stands out in the questions posted in the Open XML SDK forum is that people have little idea about the underlying XML in the file. Granted, the SDK is supposed to abstract that for us and make working with the underlying XML more efficient – and it does do so, mainly through the implemenation of LINQ. But it does not free the developer from needing to know about what’s going on at a deeper level.

My advice to all those who want to develop solutions that leverage the Office applications, whether using the object model or Open XML, is:
First, look at the processes you want to use in the Office application UI. Can it even do what you want it to do? There are few instances where you can force an Office application to do something the user can’t accomplish through the UI – your code can just do it faster and make it easier, which is exactly what the Office APIs were designed for.

When working with the object model of Word or Excel, try recording a macro. The macro recorder doesn’t pick up everything, these days. But if you’re lucky, it will give you the basic syntax for objects and there properties. This provides a starting point for looking things up in the Language Reference.

For the Open XML file formats, save a document with the thing you’re interested in at the moment, change the extension to *.zip, then look at the “Parts” in the zip package to see how the XML is constructed.

Tip: it will help the readability of WordOpenXML immensly if you turn off the inclusion of RSIDs in the document. These help the quality of the result when merging multiple documents (something done during Reviewing), but make it difficult to read WordOpenXML. The setting can be found in Options/Trust Center. In the object model it’s Application.Options.StoreRSIDOnSave = True / False.

In my next post I’ll show you my Open XML version of the tool for form field number formatting and point out where the Open XML SDK has made coding more efficient.

Leave a Reply