Office 2007 has a much nicer document format. The new format is XML based, no longer some proprietary binary format, and unlike the previous XML format introduced in Office 2003 this is the real deal not a format that only supports a subset of the functionality.
But the new format, DOCX for Word, itself is actually a ZIP file containing a whole range of other documents. These other documents are XML, actually there can be other file types as well but the XML files is what it is all about. So ZIP and XML means we should have an easy time of opening and reading these file from our .NET programs, after all both are pretty standard and have plenty of tools and libraries available. But life is even better than that. Microsoft has decided to add native support for reading and writing these files in .NET, in the System.IO.Packaging package to be exact. Take a look at the following code for an example:
Using doc As Package = Package.Open(“C:\Documents and Settings\Maurice\My Documents\Microsoft\Demos\1. XML File format\test.docx”)
Dim part As PackagePart
Console.WriteLine(“===== Document parts. =====”)
ForEach part In doc.GetParts()
Console.WriteLine(“Press any key to continue and show the contents.”)
Console.WriteLine(“===== /word/document.xml contents. =====”)
part = doc.GetPart(New Uri(“/word/document.xml”, UriKind.RelativeOrAbsolute))
Using reader As XmlReader = XmlReader.Create(part.GetStream())
Console.WriteLine(“Press any key to end.”)
The only slightly confusing thing is trying to import the System.IO.Packaging namespace. You would tend to try to set a reference to a System.IO.Packaging.dll, either from the GAC or file system, but it just isn’t there. Instead you need to add a reference to WindowsBase.dll. It should be in the list of .NET assemblies but if not you can find it in the “C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.0″ folder.
Maurice de Beijer