A few weeks ago, I wrote about the analysis function in Studio and mentioned that analysis results often differ between the two Trados versions. In most cases the differences are small and don’t really matter but in some cases they can be substantial, and knowing the reasons for this discrepancy would be very useful when trying to figure out the “correct” word count.
The SDL Knowledge Base doesn’t have much information on this and mentions only “number-only segments” and hyphens (you can find the article here). However, there are also several other factors that affect the word count, and the issue seems to baffle many Studio users (me included). I’m on vacation in Finland and it has been quite rainy this week, so I decided to spend some time for looking for and documenting the reasons why the word counts can differ. The list below is not an exhaustive one and covers mainly Word-related issues, but it’s a good start and I’m planning to update it as needed.
Factor | Trados Studio 2009 | Trados 2007 | MS Word (2007) |
Bulleted lists (in automatic lists as well as “manual” bullets) | Bullet character not included | Bullet character not included | Included |
Numbered lists | Automatically numbered lists: List numbers not included Manually numbered lists: List numbers included in the word count and also as placeables (same as with “regular” numbers) |
List numbers not included | Included |
Hyperlinks | A link is included in the word count as one word (and placeables) or several words (and placeables) depending on the type of the link | In DOC files, a link is not included but in DOCX files a link can increase the word count by several words because the actual link text (address) is included in the word count | Links don’t increase word count |
Hidden text | Not included | Included (unless it’s Trados hidden text) | Not included |
Number-only segments | Included | Not included | Included |
Numbers in sentences | Included (counted also as placeables) | Included | Included |
Number and % combinations (8% or 8 %) | Counted as one word (plus one placeable) | Counted as one word if part of a sentence; if by itself, then it’s not included in the word count | Counted as one (8%) or two (8 %) words |
Two words written together with a / (“he/she”) | Counted as two words | Counted as two words | Counted as one word |
Chemical names, for example: 3-(3,4-dichlorophenyl)-1,1-dimethylurea | Counted as six words and three placeables | Counted as five words | Counted as one word |
Dashes: solid (—–) | Line counted as one word | Not included | Line counted as one word |
Dashes: broken (- – – – -) | Each dash counted as one word | Not included | Each dash counted as one word |
Hyphens in words (“up-to-date”) | Hyphenated word counted as one word | Hyphenated word counted as two or more words | Hyphenated word counted as one word |
In summary, one could say that Studio probably gives on average a slightly higher and more truthful word count for typical technical texts because the way it treats numbers. The other factors have probably a lesser effect in general but can make a substantial difference in specific cases, for example if broken dash lines have been used to separate sections of a document.
As I mentioned earlier, this is not a complete list, and the results are based only on my own limited testing. I realized that I could spend the rest of the summer testing various factors and perfecting the list… but luckily it’s not THAT rainy. Anyhow, rain or shine, if you know any additional examples for the list, feel free to send them to me so that I can share them here.