OpenExchange Apps for TM Management

I organized all my OpenExchange apps (and some other utilities) neatly the other day in the Welcome view of Studio using the Menu Maker app (see Figure 1 below). While doing so, I realized that I don’t actually remember what some of them do, so I thought I would organize them in my mind as well (which is a much more difficult task). Anyhow, I collected all the TM management-related apps into a table and drew a diagram that would show me at a glance what they do and how they are related to each other. I found that very helpful for myself so I thought to share it here as well…

Figure 1. SDL OpenExchange apps and other programs organized in the Welcome screen of Trados Studio 2014 using the Menu Maker app.

TM app graph

Figure 2. TM management-related apps and their main conversion functions. For details, see the table below.

Here are some additional details about the apps mentioned in the above diagram:

App name Conversion function Notes
SDLTMExport SDLTM > TMX
SDL Translation Memory Management Utility SDLTM > TMW
  • Reverses language pairsRemoves duplicates
SDLTmConvert SDLTM > SDXLIFF / XML / TMX / CSV / monolingual source and target text files
  • Includes filtering options
  • Allows splitting output into several files
  • Allows hiding and setting  user and system info
  • Allows manipulation of tagged content
  • Free limited version, unlimited paid version (35 euros)
SDLXliff2Tmx SDLXLIFF > TMX / tab delimited TXT
  • Includes filtering options
  • Allows removal of formatting tags
SDLTmReverseLangs SDLTM <> SDLTM
  • Reverses languages
TM Merge Merges SDLTMs into one SDLTM
  • Also creates additional language pairs from the available languages in the input TMs
  • Cost: 48 euros

In addition to the TM conversion apps listed above, there are also several other TM management-related apps. I couldn’t come up with a pretty diagram for them, so I just list them here (all are free unless otherwise indicated):

TM Optimizer Optimizes Trados Workbench TMs for use with Trados Studio by removing excessive formatting tags from the TM thus increasing the TM leverage. Cost: £50-100.
SDLTmFindVars Identifies potential variables (untranslated text) in translation memories and allows the user to add them as variables to a Studio TM.
SDLTM Repair Fixes specific errors in damaged Studio TMs.
Variables Manager for SDL Trados Studio Allows fast editing, copying, importing and exporting variable lists in Studio TMs.
TMX Anonymizer Anonymizes TMX files by resetting the Creation User and Change User name fields.

These tables and graphs help me to keep track of the various apps and their functions. It’s good to know what kind of apps are available because you never know when you might need them.

Trados Studio Workshops and Presentations

Oakland (California): I will be teaching a beginner level and an intermediate level Trados Studio 2011/2014 workshop at the CFI (California Federation of Interpreters) Conference in Oakland (CA) on October 13th. For details, see http://www.calinterpreters.org/2013-trados/.

San Antonio (Texas): I will also be giving two Trados Studio related presentations at the ATA Annual Conference in San Antonio on Friday November 8th: (LT-5) Dealing with Tags and (LT-7) Six Things to Make You a Better Trados Studio User. I’m hoping to see many of you at the conference!

My Top 5 OpenExchange Apps

I downloaded the new improved version of Glossary Converter the other day and took a closer look at the available OpenExchange apps just to make sure I haven’t missed any other recent updates or additions. While updating my own list of useful or potentially useful apps, I thought to share some of that info here as well. It’s good to know what kind of apps are available because you never know when you might need them.

1. Glossary Converter

This is my favorite app right now because the new version can also handle additional fields, such as client names and notes. Converting existing glossaries from Excel format takes only a few seconds. It’s incredible if you compare it to the convoluted, multi-step process that’s needed if the conversion is done with Multiterm Convert. Or it’s actually incredible that we had to put up with Multiterm Convert for all those years. However, I have to admit that I’ll miss those little rotating sprocket wheels in Multiterm Convert!

Note that if you include additional fields in the Excel glossary file, they need to be placed on the right side of the language under which you want them to appear. For example, if you have two languages and one additional field in your glossary, organize them this way if you want the additional field to appear under Language 1:

Column A: Language 1
Column B: Additional field
Column C: Language 2

And this way if you want the additional field to appear under Language 2:

Column A: Language 1
Column B: Language 2
Column C: Additional field

2. AnyTM Translation Provider

One thing that I really dislike in Studio is the fact that you can’t mix resources that have different sublanguages, such as US English and UK English memories. This app is handy for situations like that because it allows you to use TMs as reference TMs regardless of their sublanguages (or main languages for that matter). Note that this is a paid app (£9.99).

3. SDLTmConvert

Trados Studio has a really powerful set of quality assurance (QA) functions. Unfortunately, they are only for translated SDLXLIFF files and cannot be used for translation memories. This app allows you to convert a TM to SDLXLIFF format so that you can run the QA checks on the TM content and then convert the edited SDLXLIFF file back to a TM. In addition, it can covert Studio TMs to many other formats, such as CSV, TXT and even to monolingual source and target text files.

4. SDLXliff2Tmx

When working with non-Studio clients, it’s sometimes necessary to send them the new TM content as a TMX export file. You can easily do that in Studio but it does take several steps (create a new TM, import the translated SDLXLIFF files to the TM and then export the TM as a TMX file). This app gives you a faster method to accomplish the same thing, i.e. it exports SDLXLIFF files directly to a TMX file (or optionally to a tab-delimited text file).

5. SDLXLIFF Compare

I don’t know about you but I’m less than happy when I get an edited Studio file back without tracked changes and then I have to go through the whole file and figure out what was changed. This app makes life much simpler in those cases. It displays the comparison results of the two SDLXLIFF file versions in an easy-to-read XML/HTML report. It can also be very useful for project management purposes.

In addition to the above Top 5 apps, I also wanted to mention a few others that are worth remembering in case you need those functions one day.

TM management related apps

SDLTmReverseLangs

  • for reversing the languages of a TM

SDL Trados 2007 Translation Memory Plug-in

  • direct access to file-based Translator’s Workbench translation memories (TMW) without having to convert them

SDL Translation Memory Management Utility

  • includes several TM management tasks, such as TM export, duplicate removal and reversing language pairs

SDLXLIFF file management related apps

SDL XLIFF Split/Merge

  • splitting large SDLXLIFF files and merging split files into a single SDLXLIFF file

SDL Batch Find/Replace

  • for batch find and replace operations in multiple SDLXLIFF files

Miscellaneous apps

PackageReader

  • previewing Studio packages directly from an e-mail or Windows Explorer without opening Studio

TAUS Search

  • gives access to the terms and phrases in the TAUS translation corpora by allowing the corpora to be used as an external reference TM

Trados Studio Manual

  • Mats Linder’s highly rated Trados Studio manual is also available from here. I have planned to review Mats’ manual but unfortunately haven’t had time to do it. Anyhow, while waiting for my review (might be a long wait), you could take a look at what other users and reviewers have to say about it.

In addition, the OpenExchange selection also includes several AutoSuggest dictionaries, Multiterm termbases and various file type definitions (such as for Wordfast TXML files).

Where are SDL TTX It! and MS Office converter?

You might wonder why I didn’t mention these two very useful apps. The MS Office converter functionality is now built into Trados Studio 2011 (File > Export for External Review), so there’s really no need for the app anymore. And the TTX It! app that’s used for batch conversion of multiple source files into TTX format gets installed automatically and can be accessed via the All Programs > SDL > SDL Trados Studio 2011 > OpenExchange Apps folder. By the way, this is the place where many of the other apps get installed as well.

Simple Terminology Check

I was translating a large software project the other day and noticed at one point that I had mixed up the translations for words like file, folder and directory. Don’t ask me how that happened but by the time I noticed this the incorrect translations were all over the place and it would have been a time-consuming task to locate them individually since these terms were in almost every other segment. So I decided to utilize the QA checker to find the incorrect translations. This was easy to do with the Regular Expressions function, and the good news is that you don’t need to know or use any regular expressions to do this.

Go to Project Settings and select Verification > QA Checker > Regular Expressions. Select the Search regular expressions check box, if not already selected. Type a brief description or a name in the Description field. This is just for your own information. In this example we are trying to locate all segments where the word “file” is in the source but the Finnish translation does not include the matching term “tiedosto”, so as a description we can just use the word “File”. Type the source language word (“file”) in the RegEx source field and the target language word (“tiedosto”) in the RegEx target field. For the Condition, select Report if source matches but not the target from the pull-down menu. To save the search settings, click Action and select Add item. Create similar searches for other terms, as needed. That’s it and you can then close the dialog box by clicking OK.

simple_term_check

Figure 1. Settings for a search for segments where the source text includes “file” but the target doesn’t include the matching translation “tiedosto”. Note the other similar searches for term pairs “database/tietokan” and “directory/hakemisto” below the “file/tiedosto” search.

When you run the Verification (F8), all segments where the source includes the term “file” but the target doesn’t include “tiedosto” will be flagged in the verification results. It worked beautifully in my case, and I had fixed the problems in less than 5 minutes. Another nice thing with this method is that it works well even with Finnish because you can just use the Finnish word stem without having to worry about the various endings the word might have in the text.

There were a few false positives caused by words like profile (the matching translation would be profiili). These were easy to skip while going through the verification results since there weren’t many of them. However, it’s also possible to fine-tune the search with the help of “real” regular expressions to look for exact matches only, if needed. You can also run the check in the opposite direction for extra security by using the Report if target matches but not the source option.

Purple Haze – Overdose of Tags

I think tags are one of the most disliked things in Studio. This is particularly true with those users who previously translated using Workbench in Word because there you never saw tags. Tags can be annoying but it really helps if one understands how they function and how they can be handled in Studio. There’s a good blog article by Paul Filkin about handling tags here and the Studio Help also has some good info on the topic.

What’s really annoying are files that have a huge number of tags that don’t have any real meaning for the document. These are often tags that apply a different formatting to spaces between words or turn the same formatting on and off constantly. If there are only a few of them, it’s relatively easy to see that there’s no need to include them in the translation. However, dealing with a large quantity of this purple haze makes it difficult to perceive the actual text and it slows down the translation process. It’s also easier to miss the real tags and the tag verification feature becomes practically useless when there are hundreds of unnecessary warning messages.

These types of tags are common in files converted or copied from PDF format but they can also be easily produced in Word by applying and changing formatting incorrectly, for example by leaving a different formatting in spaces between words. This is very easy to do without realizing it because you don’t see the tags in Word.

A friend of mine asked me recently if there’s anything she could do to reduce the number of unnecessary tags in her files, so I thought to expand my original reply and share it here as well. I took one of her DOC files (about 1,200 words) and tested various ways to lower the tag count. When I opened the file directly in Studio there were well over 1,000 formatting tags (see Figure 1). I think this was the worst file I’ve ever seen – in most segments there were two pairs of tags between every word! These were mostly font color and spacing tags that applied a different formatting for spaces or turned the same formatting off and on, and obviously were completely unnecessary.

Anni tanni tags Raw DOC

Figure 1. The DOC file opened directly in Trados Studio without any prepping. (Note that the original French source text has been replaced with a Finnish children’s poem to protect the confidentiality of the original text. You didn’t miss anything. It was a really boring text, at least compared to Anni and her trip across the lawn to the cellar to fetch butter, milk and potatoes.)

I tried the following three methods:

1. Save the source file as DOCX and select the “Skip advanced font formatting” option in the File Types settings (Tools > Options > Microsoft Word 2007-2010 > Common). This option is not available for DOC or RTF files, so this works only with DOCX files (and PPTX and PDF files). When I opened the file in Studio, there were 118 formatting tags (<cf>) left. About half of them seemed to be unnecessary but they were easy to see and skip.

Anni tanni tags DOCX

Figure 2. The same file saved as a DOCX file and opened directly in Trados Studio.


2. Clean the file (DOCX, DOC or RTF) in Word using
CodeZapper. CodeZapper is a Word add-in that includes several cleaning functions. When processing my test file, I used the PDFTidy, PDFFix and CZL functions as a combination and did not test them separately or with any of the other functions. CodeZapper turned out to be clearly the most effective method for this file. There were only 62 formatting tags (<cs>) left in the file and they all seemed to be necessary.

Anni tanni tags CodeZapped

Figure 3. The DOC file opened directly in Trados Studio after it was prepped with CodeZapper. The process removed all tags from the sample sentences.


3.  Clean the file (DOCX, DOC or RTF) in Word using
TransTools Document Cleaner. TransTools is another Word add-in that includes a tag cleaning function. This left 156 formatting tags in the file, and most of them seemed to be unnecessary, and as we can see from the previous example, only about 60 formatting tags are needed in this file.

Anni tanni tags TTooled

Figure 4. The DOC file opened directly in Trados Studio after it was prepped with TransTools.

 
Of course, one hopes that clients would include a “tag-clearance” as part of their file prep procedure before sending files to translators. That would not only make translators’ lives easier and improve the quality of the translation and the resulting translation memory, but it would also increase fuzzy match leverage because the unnecessary tags wouldn’t be there screwing up the analysis results and fuzzy matching.

//

Project Settings vs. Tools > Options – What’s the Difference?

I think one of the most confusing aspects in Studio is the way the basic settings are selected. Even some of the more experienced users have difficulties with this, not to mention those who are just starting to use Studio. Every time I train new individual users or teach workshops I think that I should write down the rules and exceptions and create a few screenshots not just to help me to explain the differences/similarities between Projects Settings and Tools > Options settings but also to help the students to see how these settings are connected. Believe it or not, there’s actually some “logic” in the settings, and understanding the logic will make it easier to find the right settings.

This ended up being a much longer article than I had planned, so if my explanations below are too much for you, see at least the summary at the end and the screenshots. That would be a good start.

Project settings

You can access Project settings in every Studio view via the Project menu (Project > Project Settings) and also in the Editor view via the Project Settings button which is above the Translation Results window. Project settings apply only to the active (current) project. Whatever you change here, won’t affect any of the future or other existing projects. (Note that “project” here refers to both “standard Studio projects” [= using the New Project command] and “single file projects” [= using the Open Document command] .) That’s quite simple and straightforward. However, there are two additional things to remember about project settings:

1. File Types: Most of the File Type settings control the conversion and extraction of source files to the SDLXLIFF format, for example whether comments, hidden text, worksheet names etc. are translated or how PDF files are converted, or how elements and attributes are handled in html files. In most cases the default settings work fine and you don’t need to worry about changing these settings. However, if any of these need to be changed it has to be done before the source file is converted (i.e. opened in Studio for the first time), otherwise they don’t have any effect. If you are creating a “single file project” you do this in the Open Document dialog box by selecting the Advanced button to access the File Type settings (see Figure 1 below).If you are creating a “standard Studio project” you do this on the Project Files page of the New Project wizard by clicking the File Types button (see Figure 2 below).

Note that when you select the Advanced button to access the File Type settings in the above example, it will take you to the Project Template Settings dialog box (which looks almost exactly like the Project Settings dialog box). Everything you change here will actually stay as part of your default project template and will affect all future projects unless you change the setting later in the Project Template Settings dialog box or in the Options dialog box. I’m not explaining the Project Template concept here but you can find more info about it here. Utilizing the templates can help you to streamline the process of creating projects, particularly if you have similar projects frequently.

OpenDoc_Advanced

Figure 1. Accessing File Type settings via the Open Document dialog box when creating a single file project.

NewProj_FileTypes

Figure 2. Accessing File Type settings via the Project Files page when creating a standard Studio project.

So, what can you do if you have already created a project and then notice that you should have changed File Type settings? If you are working on a “standard Studio project” and need to change the settings, go to the Project Settings (> File Types), change the needed settings, and add the file again to the project (you might need to delete the previous file first or rename your new file in order to be able to add it to the project). Adding files to an existing project is another less than intuitive process but you can find the instructions here.  If you are working on a “single file project” and need to change the settings, you would need to create the project again and remember to change the settings when you are in the Open Document dialog box (Figure 1).

2. Language Pairs: The other source of confusion in Project Settings is the various language pairs listed under Language Pairs (see Figure 3). You should have there “All Language Pairs” and then at least one additional language pair. The settings under All Language Pairs are basically general settings that apply to all the language pairs of a project. If you want to change these settings independently for specific language pairs you would do it under the language pair in question. This way you could, for example, use different minimum match values or translation memory fields for different language pairs. However, since most of us are working with a single language pair per project, it’s simpler if you just make all the changes directly to the settings under your language pair (i.e. not the All Language Pairs option). That way you can be sure that they will take effect. Note, however, that the settings on the Translation Memory and Automated Translation page (i.e. TMs and their usage, such as Lookup, Concordance, etc.) can only be changed under All Language Pairs (they are grayed out under the individual language pairs). In addition, as you can see in Figure 3, Termbase settings are selected under All Language Pairs (because termbases are multilingual), and Auto-substitution and AutoSuggest Dictionaries settings are selected under each specific language pair only (because they are language-specific).

I could throw in a couple of other exceptions here but I’m going to spare you from that, and I’m not going to explain what happens if you have selected the Use different translation providers for this language pair option.

More info about Language Pair settings can be found here.

ProjSettings_LangPairs

Figure 3. Comparison of settings under “All Language Pairs” and an individual language pair in Project Settings.


Tools > Options settings

The Options dialog box looks very much like the Project Settings dialog box which can be confusing but there are some clear differences when you look at them more closely. The Options dialog box navigation tree includes 12 setting categories. Three of them (File Types, Verification and Language Pairs) are the same as in the Project Settings dialog box. These are marked in Figure 4 with a blue frame. However, the difference is that if you change any of the settings under these three items in the Options dialog box, the changes will affect only your future projects – they do not have any effect on your current project. If you want the changes to apply to your current project you need to make the changes via the Project Settings page, as explained earlier. All the other nine items (i.e. Editor, AutoSuggest, Default Task Sequence, Translation Memories View, Colors, Keyboard Shortcuts, Automatic Updates, Home View, and Java Runtime Engine Startup) are NOT on the Project Settings page (as you can see in Figure 4) and will take effect immediately when you click OK, and will stay in effect until you change them. In other words, they will affect your active project as well as all future and past projects.

projsetting_toolsoptions - Copy

Figure 4. Comparison of Project Settings and Options settings. Identical parts are marked with a blue frame.


Summary

Confusing? Yes, but the short version of all this is actually quite simple:

1. Project Settings affect only your active project.
2. The three “shared” items in Tools > Options dialog box (File Types, Verification and Language Pairs) affect only future projects. These are your “global” settings for future projects. See Figure 4.
3. The other nine items in Tools > Options dialog box affect active, past and future projects. They are not really project settings but affect other, more general preferences, such as font sizes, color, spell checker, keyboard shortcuts, etc. that control how Studio functions, regardless of any project-specific settings.

When you add the language pairs to the mixture, it gets a bit more confusing:

1. Memories and their usage (Lookup, Concordance, etc.) are selected under All Language Pairs.
2. If you are working with just one language pair, it’s simpler if you make all the other TM-related changes (i.e. Search, Penalties, Filters and Update) under your language pair (rather than under All Language Pairs).
3. Termbase settings are selected under All Language Pairs (because termbases are multilingual). See Figure 3.
4. Auto-substitution and AutoSuggest Dictionaries settings are selected under each specific language pair (because they are language-specific). See Figure 3.

And some of the File Type settings add their own dimension to this confusion as well:

1. Most of the File Type settings control how source files are extracted and converted to the SDLXLIFF format, so if you want to change the conversion you need to change the settings accordingly beforehand. It’s too late after the file has already been converted (i.e. opened in Studio).


Still unsure about all this?

1. Go to Tools > Options. Review all the settings and change them so that they would be the most useful “global settings” for your work in general.  For example: Editor > Spelling, Font Adaptation, Auto-propagation, Languages; Verification > QA Checker 3.0; Language Pairs > All Language Pairs > Translation Memory and Automated Translation and select the TMs that you usually want to use. All the Editor settings will take effect immediately and the Verification and Language Pairs settings will take effect when you create the next project (single-file or standard).

2. If you need to modify any of the Verification or Language Pairs settings for an individual project, do that via the Project Settings dialog box.

//

CSV File Type – A Hidden Feature

To be perfectly honest, it’s not really hidden anywhere. I just never paid any attention to it even though it has been there since the version 2009. In my own defense, I have to say that I have not translated CSV (Comma Delimited/Separated Text) files for many years. Anyhow, I came across the CSV file type settings the other day when I was looking for good examples for the next intermediate/advanced Trados Studio workshop here in San Francisco  (Dec 1st) to demonstrate how to use the file type settings in general.

Obviously the CSV settings are important for those of us who happen to translate CSV files but what caught my attention was the possibility of utilizing this file type for a couple of other purposes: translating partially translated Excel files and converting bilingual Excel files into translation memory. This is possible because the settings allow you to also bring the existing content from the target-language column to the target-language column of your Studio file. You just need to tell which column is the source and which one is the target (see the screenshot below).

Screenshot

1.  Translating partially translated Excel files

Sometimes I get partially translated Excel files for translation. The missing translations (= empty cells) are here and there throughout the target-language column. I usually sort the file so that all the empty cells are together and I can copy all the matching source cells at once to a new document which I then translate. After translating I copy the translations to the empty target cells and sort the file back to its original order.

One downside of the above method is that I won’t see any of the previously translated material in Studio and I need to keep the Excel file open as a reference. Now, if I saved the file as a CSV file and opened it in Studio, I could see all the existing translations and utilize Studio’s search functions and the Display filter which could be very useful. I can also lock all the existing translations so that I don’t accidentally change them (see the “Lock existing translations” setting in the screenshot). The translated CSV file can then be opened directly in Excel and saved as an Excel file. Note, however, that the conversion from Excel format to CSV is not always a good idea because you can lose some information, such as all the formatting.

2.  Converting bilingual Excel files into translation memory

This makes it easy to convert bilingual Excel files to a Studio memory. Just save the file as a CSV file, select the suitable CSV file type settings in Studio and open the file in Studio. While the file is open in the Studio Editor, you can run a spell-check, QA verification or anything else you want before saving it as a SDLXLIFF file which you can then import to an existing Studio TM. Note again that all the formatting is lost when the Excel file is converted into CSV format.

All the above also applies to tab delimited text files and there’s an identical file settings page for this file type.

Trados Studio Workshops in Los Angeles and San Francisco

Los Angeles: I’ll be teaching beginner and intermediate level Studio workshops at the CFI conference in Los Angeles on October 5th. For details, see http://www.calinterpreters.org/conference/schedule.

San Francisco: I’ll be teaching a beginner level workshop on November 10th and an intermediate/advanced level workshop on December 1st in San Francisco for the Northern California Translators Association (NCTA). For details, see http://www.ncta.org/displaycommon.cfm?an=7.

If you need any additional info about the workshops, let me know.

Tools for Translation Quality Assurance

You might be interested in this webinar that I will be teaching on Monday (Sep. 10) titled “Tools for Translation Quality Assurance – What Every CAT Tool User Should Know About Quality Assurance”. It will include an overview of QA functions in Trados Studio (and memoQ) and a little bit about regular expressions as well. In addition, I will show how some stand-alone translation QA tools, such as Verifika and ApSIC Xbench, function.

For more information or to register, visit: http://www.ecpdwebinars.co.uk/events_89171.html

//

AutoSuggest Case Insensitivity, Source Segment Editing and Other SP2 Improvements

One of the first things I did this week after getting back from my summer vacation was that I installed the new SP2 that came out a couple of weeks ago. I was very happy to notice these two improvements that I (and probably everyone else as well) have been wanting to see since the version 2009:

1. The option to make AutoSuggest case-insensitive
With this improvement it doesn’t matter anymore whether your termbase entries start with an upper or lower case. You just need to go to Tools > Options > AutoSuggest and clear the check box next to the Case sensitive option (see the screen shot below).

2. The ability to edit source segments
This feature needs to be turned on first in Project Settings (see the screen shot below). After that you can enable source segment editing for the segment where your cursor is by pressing Alt+F2 (or right-click on the segment and select Edit Source). I’m not really planning to start correcting the numerous typos I see in source texts but I think the best use for this will be combining sentence fragments that are separated by erroneous hard returns (since these cannot be merged using the Merge Segments command). Note that you can’t actually delete the hard return character because it’s not visible in the source segment but you can cut & paste the segment fragments together after enabling the source segment editing.

The new SP2 has also plenty of other improvements. More information can be found in the Online Help and even more details in the SP2 Release Notes.