Content Controls and language formatting

Long-time Word users, especially those who work on professional documentation in multilingual environments, know that you have to be very careful that the language default in Word matches the default language setting in Windows. If it doesn’t, the Windows language will override the Word language as direct formatting in the document. Documents created in such an environment are difficult to get under control – it’s almost impossible to clean out the imposed Windows language without delving into the Open XML file format.

(More about synchronizing Windows and Word languages can be read at http://homepage.swissonline.ch/cindymeister/MyFavTip.htm#LangFormat.)

In my years of on-line support I’ve encountered a number of problems that traced back to unsynched language formatting. I never believed I’d seen everything, still, I was surprised at a case that came across my desk over the New Year holidays.

The text in content controls was behaving strangely. It would not adapt the surrounding style formatting, nor could it be formatted differently by applying other formatting directly. It insisted on displaying as Arial 10. As investigation continued, it turned out that numerals would only type from right-to-left.

A closer look revealed that the numerals were formatted with the Hebrew language (he-IL), while text was formatted in English (en-US). When the document was edited on non-Hebrew Windows machines it was possible to get things under control by replacing the content controls with new ones, but this wasn’t working for the owner of the document. Also disturbing was the fact that the Windows language of the machine on which the document was being edited (de-CH) was being used for anything new typed into a content control – meaning yet a third language formatting was being introduced and overriding direct and style formatting.

Pursuing the issue with the owner of the document revealed that the document had been created on a system where the Windows and Word default languages did not match, and that the Windows language was right-to-left, while Word’s default language was left-to-right.

Armed with this information, I inspected the document’s Word Open XML in an attempt to isolate the underlying cause(s) of the behavior and determine how they could be corrected. Otherwise, it would be necessary to copy-paste the content into a new document, created in a synchronized language environment, something the owner of the document did not want to do.

Hebrew is a “complex script”; Word may use a different typeface for anything formatted in a language marked as “complex” than it uses for “Latin” languages. The basis information for typefaces to be used for various languages is stored in the THEME for the document. This is where the Arial 10 originated. Further, this font was different than the default complex script typeface set for the document (in styles.xml), which was Times New Roman – again, a conflict. If the font for complex script formatting was not specified directly in a style definition, Word was picking up the typeface specified for complex script from the theme.

Setting the same font for complex script as for Latin in the style definition, where that was possible, allowed most content controls to adapt the style formatting. There were still problems in the cases where the font used in the style had no corresponding typeface for complex script, and numbers were still being formatted with Hebrew. We needed to dig deeper.

The next question pursued was, why was this happening only in content controls? It turns out that Word takes the Windows language formatting into consideration when inserting a content control. As Hebrew is right-to-left, the run properties for the content control (not for runs in the content control – just for the content control) contain the instruction , so the content control will default to right-to-left and, consequently, to complex script. Therefore, the style formatting could not override the complex script typeface.

When inserting a content control on a machine where Hebrew was not the Windows language, the instruction is not inserted, which explains why everyone else could “solve” the problems, but not the owner of the document. Only after the Windows language was changed to match the Word language could content controls be inserted without the right-to-left instruction. At that point, the document could be fixed so that it could used in production.

Why numerical input was being formatted as Hebrew and non-numerical was not is still an unsolved mystery…



Leave a Reply