Working with form fields using the Open XML SDK

As described in earlier posts, Word 2007 and Word 2010 don’t properly save number formats to number-type form fields. So I decided to try my hand at my first serious manipulation of Office Open XML files. As I felt more comfortable with old-fashioned XML coding, the tool presented in that post only leveraged the Open XML SDK for opening the file “package” and obtaining the required Part from the package.

Once it was clear that my approach did work, it was time to see if I could do the same using the full capabilities of the Open XML SDK. It took me a while to get the feel for using the objects it exposes and working with the provided LINQ extensions, but in the end it worked. And I was surprised at how much simpler the task actually was then using “pure” XML, once I understood the logic.

Both sets of code start with the same using clause for opening the Word document Open XML file and accessing the main document Part. After that, the only similarity is the fact that the logic in both bases on how the Word Open XML is structured for form fields, as shown in the earlier post.

No need to initiate an XML NameTable for the URIs to be used or to load an XML document – the Open XML SDK takes care of all that “plumbing”! The code can proceed directly to accessing the form field elements, starting with w:ffData that contains the child element w:name (the name of the form field). In contrast to the first set of code, this variation will pick up all the form fields in the document, not just the number types.

Rather than using SelectNodes and having to write out an XPath to get a set of elements, declare an IEnumerable object of the WordProcessing namespace (representing an element) and assign it the Descendents for that same type of object present in the document part. (In the case of form fields this will always be the main document part.)

IEnumerable fdata = 

You can then loop through the sequence (it’s like working with a collection, but in LINQ it’s called a sequence) – every form field in the document.

The first task step is to pick out only the TextInput (textbox) type of formfield (rather than a dropdown or a checkbox) and determine whether it’s formatted as a number type. This time, we use LINQ to distinguish only the descendents of w:ffdata that are w:textInput objects by querying whether the name of the object (element) is textInput. Note the explicit casting of the LINQ Query result to the type of object being populated.

    TextInput tb = (TextInput)fd.Descendants().SingleOrDefault
  (f => f.LocalName == "textInput");

If the LINQ query was successful and the form field is a textbox, the TextInput object will not be null. In that case, we try to obtain the w:type element, which may not be present (and thus would return null), or may not be of type number. (Remember, the form field type is specified in the val attribute of the w:type element.)

    if (tb != null)
        TextBoxFormFieldType tbType = tb.Elements
        if (tbType!=null && tbType.Val == "number")

Assuming we’re dealing with a textbox of type number, the next step is to get the form field’s name. Since in a valid WordProcessing document a w:ffData element contains only one w:name child element we can get a FormFieldName object directly by specifying the code to return the FirstOrDefault element of all FormFieldName elements under w:ffData. (FirstOrDefault means that, if there is no element of the kind specified null will be returned.)

The actual name of the form field is held in the attribute val. While you can go to all the trouble of accessing it through GetAttributes (line commented out), the SDK knows a FormFieldName object has the attribute, and provides it for you as a property.

foreach (FormFieldData fd in fdata)
    FormFieldName fn = fd.Elements().FirstOrDefault();
//    ffldName = fn.GetAttributes().FirstOrDefault().Value;
    ffldName = fn.Val;

Once we know we’re dealing with number text box, we have to find out if there’s a w:number element that defines a number picture for the textbox content.

        { // it's a number form field
            Format tbf = tb.GetFirstChild();
            if (tbf != null) 

If there is, we then determine the content of the val attribute (the number format picture). In the case it’s an empty string (no number picture specified), the default number picture is assigned.

                numFormat = tbf.Val;
                if (numFormat == String.Empty)
                    tbf.Val = fieldNumberFormat;
                    numFormat = tbf.Val;

Of course, the whole reason for this exercise is that the w:format element is not written to certain documents, so we have to allow for the case that it’s missing and create it. Here we see another instance where the Open XML SDK can save us a lot of work. No need to create separate objects for the element and its attributes, then append them in the correct order. Just create the object, assign a string to the Val property, then append it to the correct parent element (the TextInput).

                Format newTbf = new Format();
                newTbf.Val = fieldNumberFormat;

Making sure the key (form field name) is unique and adding the pair to the Dictionary remains basically the same as in the other version. Here, too, the changes need to be saved back to the WordProcessing document.

            if (fflds.Keys.Contains(ffldName))
                ffldName += "_" + counter.ToString();
                fn.Val = ffldName;
           fflds.Add(ffldName, numFormat);

The full VS 2008 solution can be downloaded here.

Leave a Reply