alexpgp | Once around the block...

Wordfast has the capability to look things up in up to three glossaries, and provides a mechanism for morphing the terms so that an entry for, say, "converter" will be triggered by the appearance of "converters" in the source text. This kind of arrangement really favors short glossary entries (which ought to be short anyway), as the chances of matching a term that contains more than a couple of words fall rapidly as the length of the string increases.

I am also giving serious thought to combining my various translation memory files into one master file, as Wordfast has a capability to tag individual "translation units" as being associated with a certain client and a certain subject, and to "penalize" the matchup score depending on the source and subject, which helps make sure that the most relevant matches float to the top of the heap.

Of course, before combining files, I should probably process them to eliminate duplicate entries (this feature is no longer explicit in the software that handles the databases). I'd use a Perl script to do this, but the TM files exist in Unicode format (and specifically Unicode and not UTF-8) and I spent enough time on my VAIO last night to convince myself that Something Special™ has to happen in order to handle Unicode strings properly in Perl (I suspect there's a module required to enable handling of Unicode).

Then again, I could simply convert the files to ordinary text, process 'em in Perl, and then convert them back to Unicode, too. (It'd be loads simpler.) Hmmm.

I need to go jump on the daily report from the Russian control room.

Cheers...