Word Count Differences between Trados Studio and Trados 2007 (and Word)

A few weeks ago, I wrote about the analysis function in Studio and mentioned that analysis results often differ between the two Trados versions. In most cases the differences are small and don’t really matter but in some cases they can be substantial, and knowing the reasons for this discrepancy would be very useful when trying to figure out the “correct” word count.

The SDL Knowledge Base doesn’t have much information on this and mentions only “number-only segments” and hyphens (you can find the article here). However, there are also several other factors that affect the word count, and the issue seems to baffle many Studio users (me included). I’m on vacation in Finland and it has been quite rainy this week, so I decided to spend some time for looking for and documenting the reasons why the word counts can differ.  The list below is not an exhaustive one and covers mainly Word-related issues, but it’s a good start and I’m planning to update it as needed.

Factor Trados Studio 2009 Trados 2007 MS Word (2007)
Bulleted lists (in automatic lists as well as “manual” bullets) Bullet character not included Bullet character not included Included
Numbered lists Automatically numbered lists: List numbers not included
Manually numbered lists:
List numbers included in the word count and also as placeables (same as with “regular” numbers)
List numbers not included Included
Hyperlinks A link is included in the word count as one word (and placeables) or several words (and placeables) depending on the type of the link In DOC files, a link is not included but in DOCX files a link can increase the word count by several words because the actual link text (address) is included in the word count Links don’t increase word count
Hidden text Not included Included (unless it’s Trados hidden text) Not included
Number-only segments Included Not included Included
Numbers in sentences Included (counted also as placeables) Included Included
Number and % combinations (8% or 8 %) Counted as one word (plus one placeable) Counted as one word if part of a sentence; if by itself, then it’s not included in the word count Counted as one (8%) or two (8 %) words
Two words written together with a / (“he/she”) Counted as two words Counted as two words Counted as one word
Chemical names, for example: 3-(3,4-dichlorophenyl)-1,1-dimethylurea Counted as six words and three placeables Counted as five words Counted as one word
Dashes: solid (—–) Line counted as one word Not included Line counted as one word
Dashes: broken (- – – – -) Each dash counted as one word Not included Each dash counted as one word
Hyphens in words (“up-to-date”) Hyphenated word counted as one word Hyphenated word counted as two or more words Hyphenated word counted as one word

In summary, one could say that Studio probably gives on average a slightly higher and more truthful word count for typical technical texts because the way it treats numbers. The other factors have probably a lesser effect in general but can make a substantial difference in specific cases, for example if broken dash lines have been used to separate sections of a document.

As I mentioned earlier, this is not a complete list, and the results are based only on my own limited testing. I realized that I could spend the rest of the summer testing various factors and perfecting the list… but luckily it’s not THAT rainy. Anyhow, rain or shine, if you know any additional examples for the list, feel free to send them to me so that I can share them here.