Word 2013 Comments and the Open XML SDK 2.5

Changes in the Word Open XML for comment’s functionality in Word 2013 were presented in my last post. This time, we’ll look at how these changes affect working with Comments using the Open XML SDK. As this is new functionality, version 2.5 of the Open XML SDK is required – the extensions are not available in the earlier versions 1.0 or 2.0. The new version requires the .NET Framework 4.0 which means it can’t be used with Visual Studio 2008.

The CTP (Customer Technical Preview) of the Open XML SDK 2.5 can be downloaded here. Please note that this is not the final version and cannot be redistributed as part of a product. There may be changes when the releases of Office 2013 and Visual Studio 2012 are complete. This discussion bases on the CTP version.

The sample code for this discussion can be downloaded as a Zip-file. It was compiled using the CTP version of Visual Studio 2012 and will require that version in order to be opened. You can also view the complete code in the *.cs file using a text editor and import it into your own project in VS 2010. The sample will work on any Word Open XML document, no matter in which version it was saved.

In essence, what the sample does is extract the same information as the example in the discussion of Word 2013 Comments using the Word object model: Author, author’s Initials, comment ID, date the comment was made, the text of the comment, the paraID, whether the Comment has been marked “Done” and whether the comment is an Ancestor or a Reply in a discussion.

Windows Form for Open XML SDK Word 2013 Comments sample

In addition to the usual using statements at the beginning of the “code page”, we need to reference a new class in the 2.5 SDK. This gives us access to the new namespaces mentioned in the last post, more specifically, to the new part commentsExtended.xml:
using DocumentFormat.OpenXml.Office15.Word;

The code starts in the usual manner, opening a Word Open XML document as a WordProcessingDocument. In contrast to the discussion about form fields, the code does not work only in the MainDocumentPart, but also with the CommentsPart and the new CommentsExPart. Not all documents will have these parts. A document with no Comments will probably not have a CommentsPart, and a document that was created and saved in an older version of Word will not have the CommentsExPart.

It is therefore important to use the GetPartsCountOfType method to check for the presence of at least one such part. If the parts are there, their contents are assigned to objects of the corresponding type.

using (WordprocessingDocument openXMLDoc = 
       WordprocessingDocument.Open(openXMLFileName, true))
{
    MainDocumentPart wordDoc = openXMLDoc.MainDocumentPart;
    if (wordDoc.GetPartsCountOfType() > 0)
    {
        comments = wordDoc.WordprocessingCommentsPart.Comments;
    }
    else
    {
        this.txtListComments.Text = "This document contains no Comment Part.";
        return;
    }
    if (wordDoc.GetPartsCountOfType() > 0)
    {
        cExtPart = wordDoc.WordprocessingCommentsExPart;
        cExts = cExtPart.CommentsEx.Descendants();
    }

The method comments.Count() gives us the number of Comments in the CommentsPart. If there are more than zero, the next step is to extract the information that is present in all Word Open XML documents: Author, Initials, date and the comment text.

    foreach (Comment c in comments)
    {
        DateTime dt = c.Date;
        this.txtListComments.Text += String.Format(
            "Author: {0}{4}Initials: {1}{4}Id: {2} {4}Date: {3}{4}", 
            c.Author, c.Initials, c.Id, dt.ToString("dd-MMM-yyyy"), newLine);
        this.txtListComments.Text += String.Format("Comment text: {0}{1}{1}", 
            c.InnerText, newLine);

The rest of the information – whether the Comment is “done”, is an Ancestor or a Reply – depends on the document having been saved as a Word 2013 document and having a CommentsExPart.

Since we’re getting the information for each Comment, we have to find the corresponding entry for the Comment in the CommentsExPart. As mentioned in the last post, the two sets of information are linked by the paraId attribute, and if the Comment consists of multiple paragraphs it will be the paraId of the last paragraph. It is possible that there is not a matching entry in CommentsEx for a paraId, so you have to check for a match before proceding.

    if (cExtPart != null)
    {
        Paragraph cPara = c.ChildElements.OfType().LastOrDefault();
        string paraID = cPara.ParagraphId.Value;
        CommentEx cExMatch = (CommentEx)cExts.Where(
            cEx => (cEx.ParaId).ToString() == paraID).FirstOrDefault();
        if (cExMatch != null) {

Finding out whether a Comment is “done” is fairly simple:

        this.txtListComments.Text += String.Format("Comment done: {0}{1}{1}", 
                BoolToString(cExMatch.Done), newLine);

Determining whether it’s an Ancestor in a discussion is much more complex. If a Comment is an Ancestor, then there are entries in CommentsExPart that have the Comment’s paraId as the value of their paraIdParent. So we get a list of all replies in the CommentsExPart whose paraIdParent matches the paraId of the current Comment:

            var replies = cExts.Where(r => r.ParaIdParent == paraID);

If there are entries in this list, then the Comment is an Ancestor and the next step is to get the list of Comment Id values for each of these Replies. That sounds very simple and straight-forward, but it is not because the Comment Id information is not stored in the CommentsExPart – it’s back in the CommentsPart. This means we have to get the paraId of each member of replies, look that up that Paragraph in the CommentsPart, go “up” a level to the w:comment element in which it resides and get its Id attribute’s value.

            if (replies.Count() > 0)
            { 
                string replyParaIds = String.Empty;
                foreach (CommentEx r in replies)
                { 
                    Paragraph pCmt = comments.Descendants().Where(
                        p => p.ParagraphId == r.ParaId.Value).FirstOrDefault();
                    string cmtId = pCmt.Ancestors().First().Id;
                    replyParaIds += "Comment " + cmtId + ", ";
                }
            }

The last step is to determine whetherthe Comment being processed is a Reply. Only replies will have the paraIdParent attribute, so this is again straight-forward.

if (cExMatch.ParaIdParent != String.Empty && 
    cExMatch.ParaIdParent != null)
            {
                this.txtListComments.Text += string.Format(
                    "This comment is a reply to {0}{1}{1}",
                    cExMatch.ParaIdParent, newLine);
            }
        }


Leave a Reply