Web Add-ins: Reading Word Document Properties

Besides writing to a specified place in a document, the other major thing an “App for Word” can do is communicate with Custom XML Parts. Mainly, I suppose the reason this was included in the original APIs is because Word can link a content control to a node in a Custom XML Part. Changing the content of either the content control or the node will mirror that change at the other end of the link. This capability is of interest for “data-mining” documents since it’s a fairly simple task to read a Custom XML Part from a closed Word document by leveraging the Office Open XML.

A side effect of providing access to Custom XML Parts in the Web Add-in API is that code can read and write to the Custom XML Parts containing built-in and custom document properties. As I’ve not come across any sample code demonstrating how this can be done and the API does not appear to work “as advertised”, it seemed this would be an interesting topic to write about.

(Note: At run-time only “static” document properties can be accessed, not those that Word calculates on-the-fly, while the document is loaded. For example, you can access the title or keywords, but not the number of pages or the template attached to the document.)

For this example I chose an approach that illustrates some of the “gotchas” when working with asynchronous functionality and function callbacks. The topic will therefore be a bit involved, so it’s broken up into two posts.

The sample bases on the Visual Studio template for “Apps for Office”. Minor modifications to the HTML page were made, which you’ll find at the end of this post: A textarea control was added to get the name of the document property (actually, it needs to be the element name as it appears in the Custom XML Part’s xml) and the link to refresh the code behind the task pane was added.

The Custom XML Parts pieces of the API used are covered by the following topics in the API documentation:
CustomXMLParts object
CustomXMLPart object

I expected to be able to use additional methods in the API to achieve my goal and was stymied by the fact that they simply did not work as as the documentation and samples indicated they should. For example, getNodesAsync did not accept any XPath expression except a global wildcard (returns all elements) and getNodeValueAsync returned the entire XML of the element, not just the text value.

So, while I was able to cobble together a deeply nested set of Async calls and creative string manipulation to finally read a document property’s setting it seemed to me there must be a more straight-forward way to manage by simply getting the XML of the Custom XML Part then using basic – and synchronous – XML tools to extract the information.

Built-in document properties in a Word Open XML zip package are stored in the files core.xml and app.xml. Accessing a Custom XML Part is done either by ID (a GUID value assigned by Word to the part) or Namespace. As there’s no predicting the GUID value, that leaves using the Namespace to access a particular type of Custom XML Part.

(If you’re not familiar with XML and Namespaces, start here for more info.)

core.xml and app.xml contain multiple namespaces. Only one is required for accessing the Custom XML Part as no filtering of the content is done – the entire part is returned. Since for the purposes of this example it doesn’t matter to which namespace a document property belongs, the code makes use of only one namespace per Custom XML Part file.

In a nutshell, what the code below does is get a Custom XML Part from the document that contains the specified Namespace (getByNamespaceAsync). The XML is read from the Custom XML Part (getXmlAsync) and checked for a node matching the document property entered by the user in the App task pane. If such a node is found and contains information, that is returned in the notification area. If the node is empty, a message to that effect is displayed, instead. Finally, if no such node is found, that information is returned.

In contrast to the previous examples, this code contains multiple, nested asynchronous function callbacks. Function callbacks must be nested so that execution occurs in order – execution waits until the Result of an async call is returned; this is evaluated in the function callback and the next steps follow. Pyramid nesting can make code very difficult to follow, it is, however, possible to use a separate, named function to process the result – the flow is then from named function to named function. In the code below, getXmlAsync is nested in getByNamespaceAsync. The function callback for getXmlAsync, however, is in the separate function onGotXml. Either approach is correct. Personally, I find calling a separate, named function easier to follow than a pyramid of nested functions, as long as I remember that control will, eventually, return to the “top of the heap”.

The tricky part in this example is the fact that, in order to search all available document properties more than one Namespace/Custom XML Part needs to be inspected. This means a loop – and looping combined with asynchronous execution presents… a conceptual challenge. I’ll leave you with that thought and pursue it in more detail in the next post.

(function () {
    "use strict";

    // The initialize function must be run each time a new page is loaded
    Office.initialize = function (reason) {
        $(document).ready(function () {


    var breakLoop = false;
    var sPropVal = '';
    var counter = 0;

    // Reads Custom XML Parts for built-in document properties
    // and passes execution to read the XML
    // extract the specified DocProp and print its value.
    function getBuiltInDocProps() {
        //Reset the controlling values
        breakLoop = false;
        sPropVal = '';
        counter = 0;

        var ns1 = 
        var ns2 = 

        //Information for looping the Custom XML Parts
        var nsList = [ns1, ns2];
        var cxp;
        var sPropName = $('textArea[id=prop]').val();
        var total = nsList.length;
        var asyncInfo = [sPropName, total];
        for (var i = 0; i  0) {
                                //A Custom XML Part was found
                                cxp = nsResult.value[0];
                                cxp.getXmlAsync({ asyncContext: asyncInfo }, onGotXml);
                        } else {
                            app.showNotification('Error:', nsResult.error.message);

    function onGotXml(xmlResult) {
        var sXml = "";
        var sTemp
        var info = xmlResult.asyncContext;
        //control the loop
        // process the callback
        if (xmlResult.status === Office.AsyncResultStatus.Succeeded) {
            sXml = xmlResult.value;
            sTemp = getDocProp(sXml, info[0]);
            var total = info[1];
            if (breakLoop && sTemp!=='empty') {
                sPropVal = sTemp;
            //checking whether the number of loops has executed
            //is the reliable way to know how to continue
            //If the content of this function were nested
            //within the other function then the "global" variables
            //could be local within the function.
            if (counter === total) {
        else {
            console.log('Error:', xmlResult.error.message);
        //does not (reliably) return anything to calling proc
        //because async will execute whenever...
        //return sXml;

    //returns value of the document property
    function getDocProp(sXml, sPropName) {
        //Doesn't work in Web Add-in / App for Office
        //var xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
        var nodeContent = 'empty';
       if (!breakLoop) {
            var oParser = new DOMParser();
            var xmlDoc = oParser.parseFromString(sXml, "text/xml");

            var nodes = xmlDoc.getElementsByTagNameNS('*', sPropName)
            if (nodes.length > 0) {
                nodeContent = nodes[0].textContent;
                if (nodeContent.length === 0) {
                    nodeContent = 'The document property "' + sPropName +
                                  '" is empty.';
                breakLoop = true;
            else if (!breakLoop) {
                nodeContent = 'The specified document property "' + sPropName + 
                               '" was not found.'
        return nodeContent;

    function done() {
        if (breakLoop) {
            app.showNotification('DocProp', '"' + sPropVal + '"');
        } else {
            app.showNotification('DocProp not found', sPropVal);


<div id="content-main">
    <div class="padding">
        <p><strong>Get the value of a document property</strong></p>
        <p>Enter the property name (Camel case):</p>
            <textarea id="prop">
            <button id="get-built-in-DocProps">Get Doc Props

        <p style="margin-top: 50px;">
            <a href="javascript:location.reload(true)">Refresh add-in

Leave a Reply