Purple Haze – Overdose of Tags

I think tags are one of the most disliked things in Studio. This is particularly true with those users who previously translated using Workbench in Word because there you never saw tags. Tags can be annoying but it really helps if one understands how they function and how they can be handled in Studio. There’s a good blog article by Paul Filkin about handling tags here and the Studio Help also has some good info on the topic.

What’s really annoying are files that have a huge number of tags that don’t have any real meaning for the document. These are often tags that apply a different formatting to spaces between words or turn the same formatting on and off constantly. If there are only a few of them, it’s relatively easy to see that there’s no need to include them in the translation. However, dealing with a large quantity of this purple haze makes it difficult to perceive the actual text and it slows down the translation process. It’s also easier to miss the real tags and the tag verification feature becomes practically useless when there are hundreds of unnecessary warning messages.

These types of tags are common in files converted or copied from PDF format but they can also be easily produced in Word by applying and changing formatting incorrectly, for example by leaving a different formatting in spaces between words. This is very easy to do without realizing it because you don’t see the tags in Word.

A friend of mine asked me recently if there’s anything she could do to reduce the number of unnecessary tags in her files, so I thought to expand my original reply and share it here as well. I took one of her DOC files (about 1,200 words) and tested various ways to lower the tag count. When I opened the file directly in Studio there were well over 1,000 formatting tags (see Figure 1). I think this was the worst file I’ve ever seen – in most segments there were two pairs of tags between every word! These were mostly font color and spacing tags that applied a different formatting for spaces or turned the same formatting off and on, and obviously were completely unnecessary.

Anni tanni tags Raw DOC

Figure 1. The DOC file opened directly in Trados Studio without any prepping. (Note that the original French source text has been replaced with a Finnish children’s poem to protect the confidentiality of the original text. You didn’t miss anything. It was a really boring text, at least compared to Anni and her trip across the lawn to the cellar to fetch butter, milk and potatoes.)

I tried the following three methods:

1. Save the source file as DOCX and select the “Skip advanced font formatting” option in the File Types settings (Tools > Options > Microsoft Word 2007-2010 > Common). This option is not available for DOC or RTF files, so this works only with DOCX files (and PPTX and PDF files). When I opened the file in Studio, there were 118 formatting tags (<cf>) left. About half of them seemed to be unnecessary but they were easy to see and skip.

Anni tanni tags DOCX

Figure 2. The same file saved as a DOCX file and opened directly in Trados Studio.


2. Clean the file (DOCX, DOC or RTF) in Word using
CodeZapper. CodeZapper is a Word add-in that includes several cleaning functions. When processing my test file, I used the PDFTidy, PDFFix and CZL functions as a combination and did not test them separately or with any of the other functions. CodeZapper turned out to be clearly the most effective method for this file. There were only 62 formatting tags (<cs>) left in the file and they all seemed to be necessary.

Anni tanni tags CodeZapped

Figure 3. The DOC file opened directly in Trados Studio after it was prepped with CodeZapper. The process removed all tags from the sample sentences.


3.  Clean the file (DOCX, DOC or RTF) in Word using
TransTools Document Cleaner. TransTools is another Word add-in that includes a tag cleaning function. This left 156 formatting tags in the file, and most of them seemed to be unnecessary, and as we can see from the previous example, only about 60 formatting tags are needed in this file.

Anni tanni tags TTooled

Figure 4. The DOC file opened directly in Trados Studio after it was prepped with TransTools.

 
Of course, one hopes that clients would include a “tag-clearance” as part of their file prep procedure before sending files to translators. That would not only make translators’ lives easier and improve the quality of the translation and the resulting translation memory, but it would also increase fuzzy match leverage because the unnecessary tags wouldn’t be there screwing up the analysis results and fuzzy matching.

//

Advertisements

Project Settings vs. Tools > Options – What’s the Difference?

I think one of the most confusing aspects in Studio is the way the basic settings are selected. Even some of the more experienced users have difficulties with this, not to mention those who are just starting to use Studio. Every time I train new individual users or teach workshops I think that I should write down the rules and exceptions and create a few screenshots not just to help me to explain the differences/similarities between Projects Settings and Tools > Options settings but also to help the students to see how these settings are connected. Believe it or not, there’s actually some “logic” in the settings, and understanding the logic will make it easier to find the right settings.

This ended up being a much longer article than I had planned, so if my explanations below are too much for you, see at least the summary at the end and the screenshots. That would be a good start.

Project settings

You can access Project settings in every Studio view via the Project menu (Project > Project Settings) and also in the Editor view via the Project Settings button which is above the Translation Results window. Project settings apply only to the active (current) project. Whatever you change here, won’t affect any of the future or other existing projects. (Note that “project” here refers to both “standard Studio projects” [= using the New Project command] and “single file projects” [= using the Open Document command] .) That’s quite simple and straightforward. However, there are two additional things to remember about project settings:

1. File Types: Most of the File Type settings control the conversion and extraction of source files to the SDLXLIFF format, for example whether comments, hidden text, worksheet names etc. are translated or how PDF files are converted, or how elements and attributes are handled in html files. In most cases the default settings work fine and you don’t need to worry about changing these settings. However, if any of these need to be changed it has to be done before the source file is converted (i.e. opened in Studio for the first time), otherwise they don’t have any effect. If you are creating a “single file project” you do this in the Open Document dialog box by selecting the Advanced button to access the File Type settings (see Figure 1 below).If you are creating a “standard Studio project” you do this on the Project Files page of the New Project wizard by clicking the File Types button (see Figure 2 below).

Note that when you select the Advanced button to access the File Type settings in the above example, it will take you to the Project Template Settings dialog box (which looks almost exactly like the Project Settings dialog box). Everything you change here will actually stay as part of your default project template and will affect all future projects unless you change the setting later in the Project Template Settings dialog box or in the Options dialog box. I’m not explaining the Project Template concept here but you can find more info about it here. Utilizing the templates can help you to streamline the process of creating projects, particularly if you have similar projects frequently.

OpenDoc_Advanced

Figure 1. Accessing File Type settings via the Open Document dialog box when creating a single file project.

NewProj_FileTypes

Figure 2. Accessing File Type settings via the Project Files page when creating a standard Studio project.

So, what can you do if you have already created a project and then notice that you should have changed File Type settings? If you are working on a “standard Studio project” and need to change the settings, go to the Project Settings (> File Types), change the needed settings, and add the file again to the project (you might need to delete the previous file first or rename your new file in order to be able to add it to the project). Adding files to an existing project is another less than intuitive process but you can find the instructions here.  If you are working on a “single file project” and need to change the settings, you would need to create the project again and remember to change the settings when you are in the Open Document dialog box (Figure 1).

2. Language Pairs: The other source of confusion in Project Settings is the various language pairs listed under Language Pairs (see Figure 3). You should have there “All Language Pairs” and then at least one additional language pair. The settings under All Language Pairs are basically general settings that apply to all the language pairs of a project. If you want to change these settings independently for specific language pairs you would do it under the language pair in question. This way you could, for example, use different minimum match values or translation memory fields for different language pairs. However, since most of us are working with a single language pair per project, it’s simpler if you just make all the changes directly to the settings under your language pair (i.e. not the All Language Pairs option). That way you can be sure that they will take effect. Note, however, that the settings on the Translation Memory and Automated Translation page (i.e. TMs and their usage, such as Lookup, Concordance, etc.) can only be changed under All Language Pairs (they are grayed out under the individual language pairs). In addition, as you can see in Figure 3, Termbase settings are selected under All Language Pairs (because termbases are multilingual), and Auto-substitution and AutoSuggest Dictionaries settings are selected under each specific language pair only (because they are language-specific).

I could throw in a couple of other exceptions here but I’m going to spare you from that, and I’m not going to explain what happens if you have selected the Use different translation providers for this language pair option.

More info about Language Pair settings can be found here.

ProjSettings_LangPairs

Figure 3. Comparison of settings under “All Language Pairs” and an individual language pair in Project Settings.


Tools > Options settings

The Options dialog box looks very much like the Project Settings dialog box which can be confusing but there are some clear differences when you look at them more closely. The Options dialog box navigation tree includes 12 setting categories. Three of them (File Types, Verification and Language Pairs) are the same as in the Project Settings dialog box. These are marked in Figure 4 with a blue frame. However, the difference is that if you change any of the settings under these three items in the Options dialog box, the changes will affect only your future projects – they do not have any effect on your current project. If you want the changes to apply to your current project you need to make the changes via the Project Settings page, as explained earlier. All the other nine items (i.e. Editor, AutoSuggest, Default Task Sequence, Translation Memories View, Colors, Keyboard Shortcuts, Automatic Updates, Home View, and Java Runtime Engine Startup) are NOT on the Project Settings page (as you can see in Figure 4) and will take effect immediately when you click OK, and will stay in effect until you change them. In other words, they will affect your active project as well as all future and past projects.

projsetting_toolsoptions - Copy

Figure 4. Comparison of Project Settings and Options settings. Identical parts are marked with a blue frame.


Summary

Confusing? Yes, but the short version of all this is actually quite simple:

1. Project Settings affect only your active project.
2. The three “shared” items in Tools > Options dialog box (File Types, Verification and Language Pairs) affect only future projects. These are your “global” settings for future projects. See Figure 4.
3. The other nine items in Tools > Options dialog box affect active, past and future projects. They are not really project settings but affect other, more general preferences, such as font sizes, color, spell checker, keyboard shortcuts, etc. that control how Studio functions, regardless of any project-specific settings.

When you add the language pairs to the mixture, it gets a bit more confusing:

1. Memories and their usage (Lookup, Concordance, etc.) are selected under All Language Pairs.
2. If you are working with just one language pair, it’s simpler if you make all the other TM-related changes (i.e. Search, Penalties, Filters and Update) under your language pair (rather than under All Language Pairs).
3. Termbase settings are selected under All Language Pairs (because termbases are multilingual). See Figure 3.
4. Auto-substitution and AutoSuggest Dictionaries settings are selected under each specific language pair (because they are language-specific). See Figure 3.

And some of the File Type settings add their own dimension to this confusion as well:

1. Most of the File Type settings control how source files are extracted and converted to the SDLXLIFF format, so if you want to change the conversion you need to change the settings accordingly beforehand. It’s too late after the file has already been converted (i.e. opened in Studio).


Still unsure about all this?

1. Go to Tools > Options. Review all the settings and change them so that they would be the most useful “global settings” for your work in general.  For example: Editor > Spelling, Font Adaptation, Auto-propagation, Languages; Verification > QA Checker 3.0; Language Pairs > All Language Pairs > Translation Memory and Automated Translation and select the TMs that you usually want to use. All the Editor settings will take effect immediately and the Verification and Language Pairs settings will take effect when you create the next project (single-file or standard).

2. If you need to modify any of the Verification or Language Pairs settings for an individual project, do that via the Project Settings dialog box.

//

Translating Bilingual Trados Workbench (Word) Files in Studio 2011

I wanted to start my Studio 2011 articles with this topic for two reasons: (1) This is an important enhancement of Studio for those who are stuck with clients who still require this old Trados file format, and (2) The process can be a bit confusing. Hopefully, the following information can make it easier and clearer.

1. Files need to be pre-segmented

It’s important to know that if you want to save your translation as a bilingual Workbench file (aka “uncleaned” Word file) in Studio, the file needs to be pre-segmented in Trados Workbench before you open it in Studio. To do this, select in Workbench Tools > Translate > Segment unknown sentences. For further details, see this SDL blog article.

Also, it’s better not to change the text colors in Workbench because that can create problems with target text formatting in Studio. For example, when you insert a bold tag to the target text, the tag can also include the color of the source text in addition to the actual bold formatting, i.e. your target text turns bold and blue if you had changed the source text color to blue in Workbench. To avoid this, select in Workbench Options > Translated Text Colours > Unchanged for source and target.

2. DOC or DOCX format

The pre-segmented file needs to be in either DOC or DOCX format. Studio doesn’t seem to accept RTF files. You get the “This file type is not supported” error message. If the original file is an RTF file, then just save the segmented file as DOC or DOCX.

3. Only monolingual files are supported – what’s up with that?

This might be the next stumbling block when you are trying to open the file for translation:

I find this error message very confusing. “This file cannot be processed because it was saved as a bilingual document in Word. Only monolingual files are supported”. Wait a minute… wasn’t the bilingual file support one of the new features of Studio 2011!? I think they could have easily made this error message more informative and less confusing.

Anyhow, if you get this error message when trying to open a segmented DOC or DOCX file, take a look at your file type settings for the file type in question (Tools > Options > File Types > Microsoft Word 2000-2003 / 2007-2010 > Common).

Make sure that the Process files with tw4winMark style option is NOT selected. By default, it shouldn’t be checked but sometimes you need to select it in order to be able to open files that have tw4winMark styles in them (even if the styles are not used in the document). Anyhow, that’s another topic (and source of error messages) altogether, and we’ll get back to that at some other time. So, now after unchecking the box, you should be able to open the file.

4. Getting rid of the source text in the target column

Since the file is pre-segmented, all the target fields are already filled in either with translations or with the source segment content, depending on the TM and the fuzzy match setting you used for the segmentation in Workbench. Having the target side filled in with source text can be annoying because fuzzy matches will not be automatically inserted to the already occupied target fields during your interactive translation and you would need to use the Apply Translation (Ctrl+T) command for every segment.

You can avoid this extra hassle by emptying the target fields before starting translation. The best way to do this depends on the fuzzy level you used in Workbench when you segmented the file. I think this works best if you use the 100% or higher match value setting during the segmentation step in Workbench.

This way it will be easy to clear all the source language text from the target side by using the Clear Draft Segments command (Translation > Clear Draft Segments). It will leave your 100% match translation untouched but clears all the other segments quickly in one go. If you want to clear the segments based on some other criteria, you can use the Display filter to display those segments, as needed, and then select the desired segments (click the number of the first one, keep Shift key down and click the number of the last one to be selected so that all the desired segments get highlighted) and use the Clear Target Segment command to clear the content (right-click menu or Translation > Clear Target Segment). If a large number of segments have been selected, this can take a while.

5. Use of TMs

For the segmentation in Workbench, you can use any TM you want to. Of course, if you have a client-provided or other project-specific Workbench TM, it’s probably the most practical one to use. You can also use the same TM during Studio translation. It’s really easy in Studio 2011 to include Workbench (and TMX, TXT and MDB) memories in a project because you don’t need to do the full TM upgrade process separately first. Studio 2011 allows you to run a Quick Upgrade as part of the TM selection process which makes it almost as easy to use these non-Studio TMs as it is to use actual Studio TMs. You can add non-Studio TMs in the Open File-based Translation Memory dialog box exactly the same way as Studio TMs, just make sure you have the right file type selected (see below).

6. Miscellaneous

Note that the Preview function does not work with Workbench files but you can view the target translation in Word using the File > View In > Bilingual Word Document as Target command.

And as an addition to the potential confusion, when you open a bilingual DOC file for translation, a DOCX file with the same name gets created in the same folder. Why? Good question.

 

What Does the New Studio 2011 Mean for Compatibility with Non-Studio Users

One of the major areas of improvement in the new 2011 version is its improved compatibility with translators/editors/agencies that don’t use Trados Studio. One of my favorite topics during the past two years has been the methods to overcome the various (in)compatibility issues between Studio 2009 and Trados 2007. With the new 2011 version, everything will be much easier, and excuses like “I can’t use Studio because my clients don’t use it” or “I can’t use Studio because my clients need uncleaned Word files” don’t have any merit anymore.

So, how is it done? For a background, you might want to take a look at my earlier articles about this topic in reference to Studio 2009. The first one is here and the second one here. All the options I mention in the first one, such as sending the monolingual translated file and a matching translation memory in TMX format, or using TagEditor files, are all still valid options. The main difference is that if your client really needs an uncleaned Word file you can provide that directly from Studio 2011. With Studio 2009, you had two options: 1) Give up and translate the file in Word using Trados Workbench, or 2) Translate the file first in Studio and then again with Workbench (utilizing the TM from Studio). I covered these two options in the above mentioned second article. However, in Studio 2011, you can open a presegmented bilingual Word file directly in Studio and after translating the file save it in the same format. The translated file would look exactly like it was done in Trados 2007. Note, however, that the file has to be completely presegmented in Trados Workbench first before opening it in Studio.

As I had mentioned before, the SDL blog has a good explanation on how bilingual Word files can be used in the new Studio 2011. You can find the article here. It also shows some other nifty new features, such as quick TM upgrade and new display filters.

I have been using Studio 2011 (beta and RC) for almost two months now and have translated a few bilingual Word files with it. Generally speaking, the process has worked very well and my client got their “old-fashioned” Word file while I was able to use multiple TMs, Auto Suggestion and all the other helpful Studio features.  I have encountered some issues with formatting and tags but hopefully those will be fixed in the final release. There are also some unrelated Studio settings that can interfere with the conversion process, but again, I will wait for the final release before commenting on those and show how to avoid the problems.

Another thing that improves compatibility with non-Studio users, is the SDL XLIFF Converter tool that became available last fall. It’s now part of the Studio 2011 package and includes some new settings. Unfortunately, it’s still a separate application and cannot be used from within Studio. However, what’s new is that it gets installed automatically during Studio 2011 installation (together with some other OpenExchange apps) which saves you the installation hassle.

By the way, if you are interested in the other Studio 2011 improvements, you can find more info in the online Help and the SDL Trados Studio 2011 Release Notes. There’s also a Sneak Peek at Studio 2011 webinar on September 21.

FIT, PDF and Studio 2011

I had promised in my Trados Studio presentation at the FIT Congress a couple of weeks ago that I will post a link to the presentation summary here as well. It’s on my website at www.finntranslations.com/downloads. You can also find there a summary of my other FIT Congress presentation about converting PDF files.

That brings us to the next topic, i.e. a webinar that I will be teaching this Thursday (8/25) as part of the ATA Webinar series. It’s titled “Working with PDF Files–Part 1: Using Adobe Reader/Acrobat”. For details and registration info, see the ATA webinar website. Part 2 will be on September 22, titled Working with PDF Files–Part 2: Converting and Translating PDF Files.

As part of my PDF presentation at the FIT Congress, I also talked about translating PDF files with Trados Studio. This is a topic that I have covered here in one of my earlier articles, Translating PDF files in Studio. As many of you probably know, the main problems with opening PDF files directly in Studio are incorrectly placed hard returns and the overabundance of tags, and since you can’t edit the source side, this can be very problematic. And that brings us to the third item, Trados Studio 2011…

During my PDF presentation, I showed a screen shot of the PDF file settings in Studio 2011 (beta). There’s a new setting called Skip advanced font formatting (tracking, kerning, etc.). With that setting selected, it looks like you can avoid all/most/many (?) of the unnecessary tags that in Studio 2009 could have made a file practically untranslatable. I still believe that we are better off using a good conversion tool for the PDF to Word conversion and then translating the resulting Word file in Studio (after verifying first in Word that there are no incorrectly placed hard returns). However, the new Studio definitely handles PDF files much better than the current version and might actually be a functional conversion tool for those who don’t have a better one. I wanted to bring this up now because it fits the PDF theme and will probably go unnoticed by most users when they get their hands on the new version – hopefully soon. It might be difficult to notice these smaller improvements when one gets so excited about all the big ticket enhancements Studio 2011 will introduce, such as compatibility with “old-style” bilingual (uncleaned) Word files, track changes function, Microsoft Word spell checker, “translate to fuzzy” function, etc.

I have been using the Studio 2011 beta for about three weeks now for all my translation work and will share some of my experiences here soon…

Word Count Differences between Trados Studio and Trados 2007 (and Word)

A few weeks ago, I wrote about the analysis function in Studio and mentioned that analysis results often differ between the two Trados versions. In most cases the differences are small and don’t really matter but in some cases they can be substantial, and knowing the reasons for this discrepancy would be very useful when trying to figure out the “correct” word count.

The SDL Knowledge Base doesn’t have much information on this and mentions only “number-only segments” and hyphens (you can find the article here). However, there are also several other factors that affect the word count, and the issue seems to baffle many Studio users (me included). I’m on vacation in Finland and it has been quite rainy this week, so I decided to spend some time for looking for and documenting the reasons why the word counts can differ.  The list below is not an exhaustive one and covers mainly Word-related issues, but it’s a good start and I’m planning to update it as needed.

Factor Trados Studio 2009 Trados 2007 MS Word (2007)
Bulleted lists (in automatic lists as well as “manual” bullets) Bullet character not included Bullet character not included Included
Numbered lists Automatically numbered lists: List numbers not included
Manually numbered lists:
List numbers included in the word count and also as placeables (same as with “regular” numbers)
List numbers not included Included
Hyperlinks A link is included in the word count as one word (and placeables) or several words (and placeables) depending on the type of the link In DOC files, a link is not included but in DOCX files a link can increase the word count by several words because the actual link text (address) is included in the word count Links don’t increase word count
Hidden text Not included Included (unless it’s Trados hidden text) Not included
Number-only segments Included Not included Included
Numbers in sentences Included (counted also as placeables) Included Included
Number and % combinations (8% or 8 %) Counted as one word (plus one placeable) Counted as one word if part of a sentence; if by itself, then it’s not included in the word count Counted as one (8%) or two (8 %) words
Two words written together with a / (“he/she”) Counted as two words Counted as two words Counted as one word
Chemical names, for example: 3-(3,4-dichlorophenyl)-1,1-dimethylurea Counted as six words and three placeables Counted as five words Counted as one word
Dashes: solid (—–) Line counted as one word Not included Line counted as one word
Dashes: broken (- – – – -) Each dash counted as one word Not included Each dash counted as one word
Hyphens in words (“up-to-date”) Hyphenated word counted as one word Hyphenated word counted as two or more words Hyphenated word counted as one word

In summary, one could say that Studio probably gives on average a slightly higher and more truthful word count for typical technical texts because the way it treats numbers. The other factors have probably a lesser effect in general but can make a substantial difference in specific cases, for example if broken dash lines have been used to separate sections of a document.

As I mentioned earlier, this is not a complete list, and the results are based only on my own limited testing. I realized that I could spend the rest of the summer testing various factors and perfecting the list… but luckily it’s not THAT rainy. Anyhow, rain or shine, if you know any additional examples for the list, feel free to send them to me so that I can share them here.

Translating Wordfast Files in Studio and Other Souvenirs from Boulder

I wanted to mention two things that came up while I was in Boulder last weekend before I forget them. First of all, somebody asked me whether it’s possible to translate Wordfast TXML files in Studio. I had checked into this when I started using Studio over a year ago and at that time nobody seemed to know. However, I vaguely remembered that I had seen something about this somewhere later on and managed to find it on the SDL blog. There’s a good article by Paul Filkin on how this can be done. Basically, you need to define a file type settings for TXML files. Paul’s example shows how this can be done with some common tags (non-translatable content). The same method can be used for other tags as well, as needed. I haven’t tried it myself, since none of my clients use Wordfast, but it certainly looks doable. You can find the instructions here.

The other was the presentation by Riccardo Schiaffino and Margherita De Togni “Trados 2007 and SDL Trados 2009:  Warts and all” about some of the shortcomings of Trados 2007 and whether those issues have been fixed in Studio. Good reading for those who still wonder whether they should upgrade or not.  One thing that Riccardo seemed to dislike quite a lot are the poor fuzzy match algorithms used in Trados. I have to say that I agree. And if anything, it’s even worse in Studio. In some of my earlier postings, I have examples of this, and Riccardo has some examples on his blog. I actually take a screen shot every time I see a really funky and weird fuzzy match result. It would make a good photo exhibition by now.