A fairly common question in the world of Word Development is how to merge multiple documents into one. (See for example this discussion in the MSDN Open XML SDK forum.) In the Word application and using the APIs we’d generally make use of Insert File. It’s also possible to use the Master-/Subdocument functionality.

One issue when using Insert File is that individual headers and footers, margin settings, paper orientation and newspaper columns are lost for the last section in a document. That’s because these are properties of a document section, and the last section is cut off from the end of a document being inserted into another. It is therefore necessary to first create an additional section break at the end of the document being inserted, which will preserve this information in the target document. (The Master Document functionality does this for you, but has other drawbacks.)

As mentioned in a previous blog article, the closest equivalent to Insert File in the Open XML SDK is the “altChunk” method. It inserts entire files into the Office Open XML document which the Word application later integrates into the main document. This approach would seem to be ideal, as it would automatically incorporate not only the main document body, but all graphics, headers, footers, custom xml parts, etc. All you’d need to do is create the additional section break (by copying the one that will be lost) and insert the documents as “chunks”.

Unfortunately, there’s a bug in the Word application when integrating Word document “chunks” into the main document. The process has the nasty habit of not retaining a number of SectionProperties, among them the one that sets whether a section has a Different First Page (<w:titlePg/>) and the one to restart Page Numbering (<w:pgNumType w:start=”1″/>) in a section. As long as your documents don’t need to manage these kinds of headers and footers you can probably use the “altChunk” approach. You’ll find some sample code for it in the afore-mentioned MSDN forum post.

But if you do need to handle complex headers and footers the only method currently available to you is to copy in the each document in its entirety, part-by-part. This is a non-trivial undertaking, as there are numerous possible types of Parts that can be associated not only with the main document body, but also with each header and footer part.

I’ve created a sample using C# in Visual Studio 2010 that you can download at http://homepage.swissonline.ch/cindymeister/BLOG/BlogLink.htm. It’s a Console application. (If you use VS 2008 you can copy the two *.cs files into a new Console application and set references to the DocumentFormat class of the Open XML SDK and the .NET Framework WindowsBase class.)

It starts out by creating a “plain” document with some sample text, then processes all files (which need to be Word Documents) in a specified folder. The sample code handles only documents containing two types of graphics files (“blips” and “vml”). If the sample documents contain anything else, Word will not be able to open the result. Also, the process will effectively destroy the files being inserted, so you should copy the files you want to test, rather than work with the original documents.

The code is heavily annotated with comments about what is being done, and why. If you have any questions, open a discussion in the Open XML SDK forum (and you can post a commment here, linking to the discussion).

The procedures that copy different kinds of Parts are set up in the same manner, with the same method name (so they’re overloaded). This should make it fairly simple to supplement the code to handle additional kinds of Parts.

One oft-asked issue in connection with merging documents is not addressed in this article or the code sample: That is, how to maintain font and paragraph formatting in the individual documents. This is a question of making sure each document uses a unique set of styles for formatting that should be retained. A totally different approach would be necessary to address such a requirement, which would exceed the scope of a sample.
Leave a Reply