Office Dev, Open XML, Technical

Open XML SDK Intro

Let me start by saying I AM NOT an expert with Open XML. I dabbled with it a few years ago for a small project I was doing and then merrily went on my way doing just fine without the need to touch it again. That changed this week as I had a challenge to do something the Office 365 API and Office JavaScript API don’t support (as of the writing of this post, anyway), a seemingly simple task of determine the page count of a document. The Primary Interop Assembly supports this but building a VSTO didn’t support the need…I needed something external that could inspect the document properties without actually opening the document in Word. The answer finally came to me from the other side of the world by way of a co-worker, Andrew Coates (thank you!) He pointed out that I could pull out the page count through Open XML and using the Open XML SDK, so I started diving in and learned it’s really simple to use, which is not at all how I remember it! I’ll use this post as an introduction to the SDK to show how simple it is to use.

First steps, go get the 2.5 SDK and the SDK Productivity Tool (check out this video to learn more about the tool.) If you’re more of a documentation person, here are the docs. I won’t go into the details of the Open XML spec or format, but it’s worth saying that there are multiple packages included in an Open XML document. So to interact with the document in any way we need to figure out which package we need to interact with. That’s where the Productivity Tool can help you (or the docs.) Firing that up and opening a document will allow you to inspect the Open XML of a document, find what you’re looking for, then you can program against it.

For finding the page count, I needed to look at the Pages property located in the /docProps/app.xml package under the Properties element. The screenshot here shows the Reflected Code tab opened which shows the value (1, in this case) along with the namespace of extended-properties.

Knowing it’s in extended-properties, I can now jump over to Visual Studio and use the SDK to pull out the value for the document using WordprocessingDocument.ExtendedFilePropertiesPart.Properties.Pages. Simple, I don’t even have to mess with an XML object, which is nice.

using DocumentFormat.OpenXml.Packaging;

namespace LoadOOXMLDocument
{
  class Program
  {
    static void Main(string[] args)
    {
      const string filename = “hi.docx”;
      using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filename, true))
      {
        ExtendedFilePropertiesPart propPart = wordDoc.ExtendedFilePropertiesPart;
        Console.WriteLine(“The document has {0} pages.”, propPart.Properties.Pages.Text);
        Console.ReadLine();
      }
    }
  }
}

If you want to dive deeper, here are some other online resources:

The Wordmeister
Eric White Blog
OpenXMLDeveloper.org
GitHub Samples

Leave a Reply

Your email address will not be published. Required fields are marked *