Purple Haze – Overdose of Tags

I think tags are one of the most disliked things in Studio. This is particularly true with those users who previously translated using Workbench in Word because there you never saw tags. Tags can be annoying but it really helps if one understands how they function and how they can be handled in Studio. There’s a good blog article by Paul Filkin about handling tags here and the Studio Help also has some good info on the topic.

What’s really annoying are files that have a huge number of tags that don’t have any real meaning for the document. These are often tags that apply a different formatting to spaces between words or turn the same formatting on and off constantly. If there are only a few of them, it’s relatively easy to see that there’s no need to include them in the translation. However, dealing with a large quantity of this purple haze makes it difficult to perceive the actual text and it slows down the translation process. It’s also easier to miss the real tags and the tag verification feature becomes practically useless when there are hundreds of unnecessary warning messages.

These types of tags are common in files converted or copied from PDF format but they can also be easily produced in Word by applying and changing formatting incorrectly, for example by leaving a different formatting in spaces between words. This is very easy to do without realizing it because you don’t see the tags in Word.

A friend of mine asked me recently if there’s anything she could do to reduce the number of unnecessary tags in her files, so I thought to expand my original reply and share it here as well. I took one of her DOC files (about 1,200 words) and tested various ways to lower the tag count. When I opened the file directly in Studio there were well over 1,000 formatting tags (see Figure 1). I think this was the worst file I’ve ever seen – in most segments there were two pairs of tags between every word! These were mostly font color and spacing tags that applied a different formatting for spaces or turned the same formatting off and on, and obviously were completely unnecessary.

Anni tanni tags Raw DOC

Figure 1. The DOC file opened directly in Trados Studio without any prepping. (Note that the original French source text has been replaced with a Finnish children’s poem to protect the confidentiality of the original text. You didn’t miss anything. It was a really boring text, at least compared to Anni and her trip across the lawn to the cellar to fetch butter, milk and potatoes.)

I tried the following three methods:

1. Save the source file as DOCX and select the “Skip advanced font formatting” option in the File Types settings (Tools > Options > Microsoft Word 2007-2010 > Common). This option is not available for DOC or RTF files, so this works only with DOCX files (and PPTX and PDF files). When I opened the file in Studio, there were 118 formatting tags (<cf>) left. About half of them seemed to be unnecessary but they were easy to see and skip.

Anni tanni tags DOCX

Figure 2. The same file saved as a DOCX file and opened directly in Trados Studio.

2. Clean the file (DOCX, DOC or RTF) in Word using
CodeZapper. CodeZapper is a Word add-in that includes several cleaning functions. When processing my test file, I used the PDFTidy, PDFFix and CZL functions as a combination and did not test them separately or with any of the other functions. CodeZapper turned out to be clearly the most effective method for this file. There were only 62 formatting tags (<cs>) left in the file and they all seemed to be necessary.

Anni tanni tags CodeZapped

Figure 3. The DOC file opened directly in Trados Studio after it was prepped with CodeZapper. The process removed all tags from the sample sentences.

3.  Clean the file (DOCX, DOC or RTF) in Word using
TransTools Document Cleaner. TransTools is another Word add-in that includes a tag cleaning function. This left 156 formatting tags in the file, and most of them seemed to be unnecessary, and as we can see from the previous example, only about 60 formatting tags are needed in this file.

Anni tanni tags TTooled

Figure 4. The DOC file opened directly in Trados Studio after it was prepped with TransTools.

Of course, one hopes that clients would include a “tag-clearance” as part of their file prep procedure before sending files to translators. That would not only make translators’ lives easier and improve the quality of the translation and the resulting translation memory, but it would also increase fuzzy match leverage because the unnecessary tags wouldn’t be there screwing up the analysis results and fuzzy matching.


Project Settings vs. Tools > Options – What’s the Difference?

I think one of the most confusing aspects in Studio is the way the basic settings are selected. Even some of the more experienced users have difficulties with this, not to mention those who are just starting to use Studio. Every time I train new individual users or teach workshops I think that I should write down the rules and exceptions and create a few screenshots not just to help me to explain the differences/similarities between Projects Settings and Tools > Options settings but also to help the students to see how these settings are connected. Believe it or not, there’s actually some “logic” in the settings, and understanding the logic will make it easier to find the right settings.

This ended up being a much longer article than I had planned, so if my explanations below are too much for you, see at least the summary at the end and the screenshots. That would be a good start.

Project settings

You can access Project settings in every Studio view via the Project menu (Project > Project Settings) and also in the Editor view via the Project Settings button which is above the Translation Results window. Project settings apply only to the active (current) project. Whatever you change here, won’t affect any of the future or other existing projects. (Note that “project” here refers to both “standard Studio projects” [= using the New Project command] and “single file projects” [= using the Open Document command] .) That’s quite simple and straightforward. However, there are two additional things to remember about project settings:

1. File Types: Most of the File Type settings control the conversion and extraction of source files to the SDLXLIFF format, for example whether comments, hidden text, worksheet names etc. are translated or how PDF files are converted, or how elements and attributes are handled in html files. In most cases the default settings work fine and you don’t need to worry about changing these settings. However, if any of these need to be changed it has to be done before the source file is converted (i.e. opened in Studio for the first time), otherwise they don’t have any effect. If you are creating a “single file project” you do this in the Open Document dialog box by selecting the Advanced button to access the File Type settings (see Figure 1 below).If you are creating a “standard Studio project” you do this on the Project Files page of the New Project wizard by clicking the File Types button (see Figure 2 below).

Note that when you select the Advanced button to access the File Type settings in the above example, it will take you to the Project Template Settings dialog box (which looks almost exactly like the Project Settings dialog box). Everything you change here will actually stay as part of your default project template and will affect all future projects unless you change the setting later in the Project Template Settings dialog box or in the Options dialog box. I’m not explaining the Project Template concept here but you can find more info about it here. Utilizing the templates can help you to streamline the process of creating projects, particularly if you have similar projects frequently.


Figure 1. Accessing File Type settings via the Open Document dialog box when creating a single file project.


Figure 2. Accessing File Type settings via the Project Files page when creating a standard Studio project.

So, what can you do if you have already created a project and then notice that you should have changed File Type settings? If you are working on a “standard Studio project” and need to change the settings, go to the Project Settings (> File Types), change the needed settings, and add the file again to the project (you might need to delete the previous file first or rename your new file in order to be able to add it to the project). Adding files to an existing project is another less than intuitive process but you can find the instructions here.  If you are working on a “single file project” and need to change the settings, you would need to create the project again and remember to change the settings when you are in the Open Document dialog box (Figure 1).

2. Language Pairs: The other source of confusion in Project Settings is the various language pairs listed under Language Pairs (see Figure 3). You should have there “All Language Pairs” and then at least one additional language pair. The settings under All Language Pairs are basically general settings that apply to all the language pairs of a project. If you want to change these settings independently for specific language pairs you would do it under the language pair in question. This way you could, for example, use different minimum match values or translation memory fields for different language pairs. However, since most of us are working with a single language pair per project, it’s simpler if you just make all the changes directly to the settings under your language pair (i.e. not the All Language Pairs option). That way you can be sure that they will take effect. Note, however, that the settings on the Translation Memory and Automated Translation page (i.e. TMs and their usage, such as Lookup, Concordance, etc.) can only be changed under All Language Pairs (they are grayed out under the individual language pairs). In addition, as you can see in Figure 3, Termbase settings are selected under All Language Pairs (because termbases are multilingual), and Auto-substitution and AutoSuggest Dictionaries settings are selected under each specific language pair only (because they are language-specific).

I could throw in a couple of other exceptions here but I’m going to spare you from that, and I’m not going to explain what happens if you have selected the Use different translation providers for this language pair option.

More info about Language Pair settings can be found here.


Figure 3. Comparison of settings under “All Language Pairs” and an individual language pair in Project Settings.

Tools > Options settings

The Options dialog box looks very much like the Project Settings dialog box which can be confusing but there are some clear differences when you look at them more closely. The Options dialog box navigation tree includes 12 setting categories. Three of them (File Types, Verification and Language Pairs) are the same as in the Project Settings dialog box. These are marked in Figure 4 with a blue frame. However, the difference is that if you change any of the settings under these three items in the Options dialog box, the changes will affect only your future projects – they do not have any effect on your current project. If you want the changes to apply to your current project you need to make the changes via the Project Settings page, as explained earlier. All the other nine items (i.e. Editor, AutoSuggest, Default Task Sequence, Translation Memories View, Colors, Keyboard Shortcuts, Automatic Updates, Home View, and Java Runtime Engine Startup) are NOT on the Project Settings page (as you can see in Figure 4) and will take effect immediately when you click OK, and will stay in effect until you change them. In other words, they will affect your active project as well as all future and past projects.

projsetting_toolsoptions - Copy

Figure 4. Comparison of Project Settings and Options settings. Identical parts are marked with a blue frame.


Confusing? Yes, but the short version of all this is actually quite simple:

1. Project Settings affect only your active project.
2. The three “shared” items in Tools > Options dialog box (File Types, Verification and Language Pairs) affect only future projects. These are your “global” settings for future projects. See Figure 4.
3. The other nine items in Tools > Options dialog box affect active, past and future projects. They are not really project settings but affect other, more general preferences, such as font sizes, color, spell checker, keyboard shortcuts, etc. that control how Studio functions, regardless of any project-specific settings.

When you add the language pairs to the mixture, it gets a bit more confusing:

1. Memories and their usage (Lookup, Concordance, etc.) are selected under All Language Pairs.
2. If you are working with just one language pair, it’s simpler if you make all the other TM-related changes (i.e. Search, Penalties, Filters and Update) under your language pair (rather than under All Language Pairs).
3. Termbase settings are selected under All Language Pairs (because termbases are multilingual). See Figure 3.
4. Auto-substitution and AutoSuggest Dictionaries settings are selected under each specific language pair (because they are language-specific). See Figure 3.

And some of the File Type settings add their own dimension to this confusion as well:

1. Most of the File Type settings control how source files are extracted and converted to the SDLXLIFF format, so if you want to change the conversion you need to change the settings accordingly beforehand. It’s too late after the file has already been converted (i.e. opened in Studio).

Still unsure about all this?

1. Go to Tools > Options. Review all the settings and change them so that they would be the most useful “global settings” for your work in general.  For example: Editor > Spelling, Font Adaptation, Auto-propagation, Languages; Verification > QA Checker 3.0; Language Pairs > All Language Pairs > Translation Memory and Automated Translation and select the TMs that you usually want to use. All the Editor settings will take effect immediately and the Verification and Language Pairs settings will take effect when you create the next project (single-file or standard).

2. If you need to modify any of the Verification or Language Pairs settings for an individual project, do that via the Project Settings dialog box.


CSV File Type – A Hidden Feature

To be perfectly honest, it’s not really hidden anywhere. I just never paid any attention to it even though it has been there since the version 2009. In my own defense, I have to say that I have not translated CSV (Comma Delimited/Separated Text) files for many years. Anyhow, I came across the CSV file type settings the other day when I was looking for good examples for the next intermediate/advanced Trados Studio workshop here in San Francisco  (Dec 1st) to demonstrate how to use the file type settings in general.

Obviously the CSV settings are important for those of us who happen to translate CSV files but what caught my attention was the possibility of utilizing this file type for a couple of other purposes: translating partially translated Excel files and converting bilingual Excel files into translation memory. This is possible because the settings allow you to also bring the existing content from the target-language column to the target-language column of your Studio file. You just need to tell which column is the source and which one is the target (see the screenshot below).


1.  Translating partially translated Excel files

Sometimes I get partially translated Excel files for translation. The missing translations (= empty cells) are here and there throughout the target-language column. I usually sort the file so that all the empty cells are together and I can copy all the matching source cells at once to a new document which I then translate. After translating I copy the translations to the empty target cells and sort the file back to its original order.

One downside of the above method is that I won’t see any of the previously translated material in Studio and I need to keep the Excel file open as a reference. Now, if I saved the file as a CSV file and opened it in Studio, I could see all the existing translations and utilize Studio’s search functions and the Display filter which could be very useful. I can also lock all the existing translations so that I don’t accidentally change them (see the “Lock existing translations” setting in the screenshot). The translated CSV file can then be opened directly in Excel and saved as an Excel file. Note, however, that the conversion from Excel format to CSV is not always a good idea because you can lose some information, such as all the formatting.

2.  Converting bilingual Excel files into translation memory

This makes it easy to convert bilingual Excel files to a Studio memory. Just save the file as a CSV file, select the suitable CSV file type settings in Studio and open the file in Studio. While the file is open in the Studio Editor, you can run a spell-check, QA verification or anything else you want before saving it as a SDLXLIFF file which you can then import to an existing Studio TM. Note again that all the formatting is lost when the Excel file is converted into CSV format.

All the above also applies to tab delimited text files and there’s an identical file settings page for this file type.

Trados Studio Workshops in Los Angeles and San Francisco

Los Angeles: I’ll be teaching beginner and intermediate level Studio workshops at the CFI conference in Los Angeles on October 5th. For details, see http://www.calinterpreters.org/conference/schedule.

San Francisco: I’ll be teaching a beginner level workshop on November 10th and an intermediate/advanced level workshop on December 1st in San Francisco for the Northern California Translators Association (NCTA). For details, see http://www.ncta.org/displaycommon.cfm?an=7.

If you need any additional info about the workshops, let me know.

Tools for Translation Quality Assurance

You might be interested in this webinar that I will be teaching on Monday (Sep. 10) titled “Tools for Translation Quality Assurance – What Every CAT Tool User Should Know About Quality Assurance”. It will include an overview of QA functions in Trados Studio (and memoQ) and a little bit about regular expressions as well. In addition, I will show how some stand-alone translation QA tools, such as Verifika and ApSIC Xbench, function.

For more information or to register, visit: http://www.ecpdwebinars.co.uk/events_89171.html


AutoSuggest Case Insensitivity, Source Segment Editing and Other SP2 Improvements

One of the first things I did this week after getting back from my summer vacation was that I installed the new SP2 that came out a couple of weeks ago. I was very happy to notice these two improvements that I (and probably everyone else as well) have been wanting to see since the version 2009:

1. The option to make AutoSuggest case-insensitive
With this improvement it doesn’t matter anymore whether your termbase entries start with an upper or lower case. You just need to go to Tools > Options > AutoSuggest and clear the check box next to the Case sensitive option (see the screen shot below).

2. The ability to edit source segments
This feature needs to be turned on first in Project Settings (see the screen shot below). After that you can enable source segment editing for the segment where your cursor is by pressing Alt+F2 (or right-click on the segment and select Edit Source). I’m not really planning to start correcting the numerous typos I see in source texts but I think the best use for this will be combining sentence fragments that are separated by erroneous hard returns (since these cannot be merged using the Merge Segments command). Note that you can’t actually delete the hard return character because it’s not visible in the source segment but you can cut & paste the segment fragments together after enabling the source segment editing.

The new SP2 has also plenty of other improvements. More information can be found in the Online Help and even more details in the SP2 Release Notes.

Quality Assurance and Translation Memory Maintenance

You might be interested in this workshop that I will be teaching next week in San Francisco. It’s not a Trados workshop but will cover some Trados Studio QA and TM maintenance issues and many other topics (such as regular expressions) that we all should know.

The workshop will give an overview of QA and TM maintenance functions and tools, and illustrate how they can improve translation productivity and quality when used properly. The main topics covered are:

1. Translation QA

  • Built-in QA functions in CAT tools (such as Trados Studio, memoQ and Wordfast Pro): features, setup and use
  • Stand-alone QA tools, such as QA Distiller, ErrorSpy, CheckMate and Xbench

2. Translation memory maintenance and QA

Built-in functions in CAT tools (such as Trados Studio, memoQ  and Wordfast Pro) for editing, searching, filtering, importing/exporting TMs

  • Stand-alone TM maintenance/QA tools, such as QA Distiller, ErrorSpy, CheckMate and Xbench
  • Editing translation memories in text editors, such as UltraEdit

3. Use of regular expressions in QA functions/tools

  • How to create your own regular expressions

NOTE: Even though Trados Studio, memoQ and Wordfast Pro are used for many of the examples and demonstrations during the workshop, most of the workshop content is not tool-specific and can be applied to any modern CAT tool.

For more information or to register, visit: http://www.ncta.org/displayconvention.cfm?conventionnbr=11323

Webinar on Converting/Translating PDF Files

PDF seems to be one of the most popular Studio-related search terms that bring readers to my blog, and all the PDF-related articles are among the most frequently visited ones on this blog. So, obviously there’s a lot of interest (confusion?) in this topic. As I have mentioned in the earlier articles, the best approach is to convert the file to a more suitable format (such as Word or Excel) using a real PDF conversion tool rather than opening it directly in Studio, even though Studio 2011 does the conversion much better than Studio 2009. It’s also important to remember that only text-based PDF files can be converted in Studio – it does not convert graphics-based PDF files, such as faxes.

If you want to know more about this topic, you might be interested in a webinar I will be teaching on December 7, titled Working with PDF Files – Part 2: Tools, Tips and Techniques for Converting and Translating PDF Files. It’s not a Trados Studio webinar but it will discuss the problems in translating PDF files in general and what types of tools and methods there are for converting PDF files. In addition, I will also show how to use LogiTerm AlignFactory to align PDF files for creating translation memories.

And in case you are wondering what happened to the Part 1 of the webinar series, it will be on November 30th and the title is Working with PDF Files – Part 1: Using Adobe Reader/Acrobat. Good webinar if you want to know more about Adobe Reader/Acrobat but not related to the topic of converting and translating PDF files.

ATA 2011 Conference Presentation: Working with non-Trados Studio Clients/Translators

I promised last week at the ATA conference in Boston that I will post a summary of my presentation here. You can download the slides with some additional notes by clicking the image on the left. The presentation will also be available through the ATA eConference.

However, here’s a brief summary for those in a hurry:

I reviewed various incompatibility scenarios from translator (and LSP) point of view and offered solutions so that Studio-users can utilize the benefits of Studio even if their clients/translators still use Trados 2007. I have covered these methods and scenarios in various articles on this blog during the past year or so. The list below includes links to those articles for more details.

Five ways to be compatible with Trados 2007 project flow

1. Deliver translated file and matching TM

2. Translate as a TagEditor (TTX) file in Studio

3. Bilingual Word table with SDL XLIFF Converter

4. Translate as a bilingual “uncleaned” Trados Workbench file in Studio
– possible in Studio 2011

5. Translate first in Studio and then retranslate in Trados Workbench using the same TM

In addition, I also talked about how to translate documents in Studio when only parts of a document need to be translated, such as with DéjàVu export tables.

Translating Bilingual Trados Workbench (Word) Files in Studio 2011

I wanted to start my Studio 2011 articles with this topic for two reasons: (1) This is an important enhancement of Studio for those who are stuck with clients who still require this old Trados file format, and (2) The process can be a bit confusing. Hopefully, the following information can make it easier and clearer.

1. Files need to be pre-segmented

It’s important to know that if you want to save your translation as a bilingual Workbench file (aka “uncleaned” Word file) in Studio, the file needs to be pre-segmented in Trados Workbench before you open it in Studio. To do this, select in Workbench Tools > Translate > Segment unknown sentences. For further details, see this SDL blog article.

Also, it’s better not to change the text colors in Workbench because that can create problems with target text formatting in Studio. For example, when you insert a bold tag to the target text, the tag can also include the color of the source text in addition to the actual bold formatting, i.e. your target text turns bold and blue if you had changed the source text color to blue in Workbench. To avoid this, select in Workbench Options > Translated Text Colours > Unchanged for source and target.

2. DOC or DOCX format

The pre-segmented file needs to be in either DOC or DOCX format. Studio doesn’t seem to accept RTF files. You get the “This file type is not supported” error message. If the original file is an RTF file, then just save the segmented file as DOC or DOCX.

3. Only monolingual files are supported – what’s up with that?

This might be the next stumbling block when you are trying to open the file for translation:

I find this error message very confusing. “This file cannot be processed because it was saved as a bilingual document in Word. Only monolingual files are supported”. Wait a minute… wasn’t the bilingual file support one of the new features of Studio 2011!? I think they could have easily made this error message more informative and less confusing.

Anyhow, if you get this error message when trying to open a segmented DOC or DOCX file, take a look at your file type settings for the file type in question (Tools > Options > File Types > Microsoft Word 2000-2003 / 2007-2010 > Common).

Make sure that the Process files with tw4winMark style option is NOT selected. By default, it shouldn’t be checked but sometimes you need to select it in order to be able to open files that have tw4winMark styles in them (even if the styles are not used in the document). Anyhow, that’s another topic (and source of error messages) altogether, and we’ll get back to that at some other time. So, now after unchecking the box, you should be able to open the file.

4. Getting rid of the source text in the target column

Since the file is pre-segmented, all the target fields are already filled in either with translations or with the source segment content, depending on the TM and the fuzzy match setting you used for the segmentation in Workbench. Having the target side filled in with source text can be annoying because fuzzy matches will not be automatically inserted to the already occupied target fields during your interactive translation and you would need to use the Apply Translation (Ctrl+T) command for every segment.

You can avoid this extra hassle by emptying the target fields before starting translation. The best way to do this depends on the fuzzy level you used in Workbench when you segmented the file. I think this works best if you use the 100% or higher match value setting during the segmentation step in Workbench.

This way it will be easy to clear all the source language text from the target side by using the Clear Draft Segments command (Translation > Clear Draft Segments). It will leave your 100% match translation untouched but clears all the other segments quickly in one go. If you want to clear the segments based on some other criteria, you can use the Display filter to display those segments, as needed, and then select the desired segments (click the number of the first one, keep Shift key down and click the number of the last one to be selected so that all the desired segments get highlighted) and use the Clear Target Segment command to clear the content (right-click menu or Translation > Clear Target Segment). If a large number of segments have been selected, this can take a while.

5. Use of TMs

For the segmentation in Workbench, you can use any TM you want to. Of course, if you have a client-provided or other project-specific Workbench TM, it’s probably the most practical one to use. You can also use the same TM during Studio translation. It’s really easy in Studio 2011 to include Workbench (and TMX, TXT and MDB) memories in a project because you don’t need to do the full TM upgrade process separately first. Studio 2011 allows you to run a Quick Upgrade as part of the TM selection process which makes it almost as easy to use these non-Studio TMs as it is to use actual Studio TMs. You can add non-Studio TMs in the Open File-based Translation Memory dialog box exactly the same way as Studio TMs, just make sure you have the right file type selected (see below).

6. Miscellaneous

Note that the Preview function does not work with Workbench files but you can view the target translation in Word using the File > View In > Bilingual Word Document as Target command.

And as an addition to the potential confusion, when you open a bilingual DOC file for translation, a DOCX file with the same name gets created in the same folder. Why? Good question.