Changing form field numbering formats using Open XML

Having overcome the obstacle mentioned in my last post, I continued work on a small tool to manage the number formatting of number form fields in Word documents. While becoming acquainted with the principles of Open XML I’ve always felt there are too few samples demonstrating how to work with the various Office applications. So I plan to share some of my code in this blog.

In this post, I’ll cover the basics of what’s involved in addressing and editing a certain set of form fields in a Word document. The full project can be downloaded here.

General information

The tool is a Windows Forms solution written in C# with Visual Studio 2008 that leverages the Open XML SDK. In order to run it, the Open XML SDK 2.0 must be available on the machine. As this is freely redistributable, the dll is included in the project.
Only the components that were originally in version 1.0 are included in the code as the Word 2007 file format should also be supported.

Comprising two forms, the tool enables the user to

  1. Select a document to be processed, set a default number format and list all the Number form fields in the document with the form field names and the current number format
  2. Edit the number formats in the list of form fields then write the information back to the document.

Tools for adding number pictures to Word number form fields
The WordOpenXML we’re interested in looks like this:

The opening w:fldChar element with the w:ffData child element contains all the information required for this task: the form field name (w:name), the form field type (w:textInput with w:type of number) and, were it present, the w:format element defining the number format for the field as a child element of w:textInput.
This is exactly the problem, the w:format element doesn’t get written by Word 2007 or Word 2010 to the document when it is saved. We want to create the element and append it to the XML.

Basics of using the Open XML SDK

In addition to the References generated automatically by Visual Studio for a Windows Form solution, working with the Open XML SDK requires two additional namespaces (more information):

  • DocumentFormat.OpenXML: contains the Open XML SDK functionality
  • WindowsBase: contains the System.IO.Packaging namespace on which the Open XML SDK bases.

The following using statements at the top of the code “page” facilitate use of the Open XML objects, properties and methods

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml;

Open a Word document for editing using the Open XML SDK

In order to access the XML files in an Open XML zip package, use the Open method for the appropriate file type. In this case, you’re working with a WordProcessingDocument:

using (WordprocessingDocument doc = WordprocessingDocument.Open(openXMLFileName, true))

The using statement means that Visual Studio will take care of disposing of the doc object when you’re finished with it.

The arguments for the Open method are the file path to the document and whether the document should be opened for editing (more information.)

Read information from the document

As the tool should also support Word 2007 file formats, at this point it must abandon the Open XML SDK and work with the XML content, directly. As Microsoft works intensively with namespaces in XML, the first task is to set up support for the namespace used when working with a Word document and its form fields. In this case, it’s the “standard” namespace for Word associated with the element name prefix w:

string wNamespace = "";
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", wNamespace);

In the next step, the part (XML document) of the zip package that contains the information we’re interested in is identified and loaded into a .NET Framework XML document. In this case, the part is document.xml – the xml file that contains the main content of a Word document – represented in the Open XML SDK by MainDocumentPart.

docPart = doc.MainDocumentPart;
XmlDocument xdoc = new XmlDocument(nt);

As is apparent in the XML snippet, above, the w:type child element of the w:textInput element specifies whether a form field is a number form field. So in order to get a collection of nodes with number form fields the following XPath is used:

nodes = xdoc.SelectNodes("//w:textInput/w:type[@w:val='number']", nsManager);

We then have to loop through the nodes collection to pick up the w:name and w:format elements, if they exist. The assumption is that w:format, at least, does not exist since that’s the whole reason for this exercise.

Note the use of .ParentNode since the elements to be addressed aren’t children of w:type, but are located at higher levels in the XML.

foreach (XmlNode n in nodes)
    nFieldName = n.ParentNode.ParentNode.SelectSingleNode("w:name", nsManager);
    ffldName = nFieldName.Attributes[0].Value;
    nFormat = n.ParentNode.SelectSingleNode("w:format", nsManager);

As a w:format element may not exist, we have to check whether the node object has been populated before the code can work with it in order to determine what number picture has been assigned, if any:

    if (nFormat != null)
         numFormat = nFormat.Attributes[0].Value;

If no number picture has been assigned, the value will be a zero-length string (“”). In that case, then the default number format in the Windows Form is assigned to it.

        if (numFormat == String.Empty)
             nFormat.Attributes[0].Value = fieldNumberFormat;
             numFormat = nFormat.Attributes[0].Value;

If there is no w:format element, which is to be expected with Word 2007 or Word 2010 document that has not been processed before, it’s necessary to create one and add it to the document:

         XmlNode nFormatNew = xdoc.CreateNode(System.Xml.XmlNodeType.Element, 
          "w", "format", wNamespace);
         XmlAttribute nValue = xdoc.CreateAttribute("w", "val", wNamespace);
         nValue.Value = fieldNumberFormat;
         numFormat = fieldNumberFormat;

When Word creates a form field in a document in the user interface, it assigns the form field a name and at the same time appends a bookmark to the field with the same name. If a user copies and pastes form fields, rather than using Word’s functionality to insert them, the bookmark name will be lost as these must be unique in a document. The form field will retain the original name, however, in the underlying XML.

The tool saves the form field names and corresponding number formats in a Dictionary list with the name as the key. These key values must be unique and, in order to find a form field when writing back information the name must be unique in order to identify the target. Therefore, a new name is assigned to form fields with duplicate names by appending a counter value to the original name:

    if (fflds.Keys.Contains(ffldName))
         ffldName += "_" + counter.ToString();
         nFieldName.Attributes[0].Value = ffldName;
     fflds.Add(ffldName, numFormat);

The changes are then saved back to the document:

  xdoc = null;

The code then goes on to generate the form for managing the number form fields. After the user completes this, the changes are written back to the document in a similar manner as reading the information.

Leave a Reply