alexpgp: (St. Jerome w/ computer)
Ever since college, when it became clear to me that actually doing the homework was an important step toward getting an A or a B in a course like fluid mechanics, I've developed an informal set of tactics to turn stuff I'd rather not do into a game.

I do this now with some translations, and it generally stands me in good stead. One technique that I use is to track my progress through a document. The diagram below shows the number of words I had left to translate in a document against local time.


The gap between 10:40 am and 11:20 am occurred as I tried to deal with a small hurricane of tags in my segmented text.

To explain, Word files can contain tons of hidden information in text (this most often occurs when the file was created from a PDF). Since it's hidden, nobody generally gives a rat's tail about its existence. But when such a file is opened by a translation memory program such as memoQ, the result is pretty ugly:


All of those little gray doohickies represent some kind of instruction in the file text, along the lines of a change in font, or font size, or something along those lines. Translation memory programs that use such doohickies (the technical term for which is "tags") pretty much require them to appear in the translation (else the translated text runs a high risk of not looking right), and you'll pardon me if I don't bore you with the million and one ways satisfying this requirement can go wrong when there are this many tags in a segment.

How does one get rid of tags? Well, there are a number of methods out there, and none that I've found are perfect. The one I like the best is a set of Word macros marketed by a fellow named Dave Turner under the name CodeZapper (a copy of which was bought and paid for by yours truly some while ago). After running the basic tag-zapping macro, the text in the above illustration turned into this:


You'll notice there are a lot fewer tags in the cleaned up text, and while I could probably use this text as is, there were some other segments in the text that still retained a liberal quantity of tags. So I ran the heavy-duty zapping macro and got this:


Now, this is what I'm talking about!

The end result was mostly free of tags, and was a pleasure to translate.

P.S. For those impatient to know what kind of fascinatin' stuff it is I translate, here's the English translation:
The unit has a two-cylinder, four-stroke Briggs & Stratton engine, rated at 18 hp. The average fuel consumption (using unleaded gasoline) is 5.5 liters/hr.


alexpgp: (Default)
The Wordfast translation memory package lives inside a Word template, so essentially the only thing running when you use the product is Word. This lends a lot of inertia to the idea of processing a document linearly, from the beginning all the way to the end.

That's not much of a problem, except that more often than not, the structure of a document results in better progress in some places (and worse in others), depending on what's under your cursor.

For example, tables of numbers really require very little effort to translate (generally, there are column headings and—in my practice—a need to change the decimal delimiter from a comma to a period). Otherwise, pretranslated segments require—ceteris paribus—a modest effort as compared to new text.

I mentioned the other day how the (inadvertently found) ability to address only pretranslated segments is a plus. Edit them, and then when you import a "clean" file (one with no pretranslated segments), the edited segments are filled in by the software.

I noted yesterday how sorting from shortest to longest string allowed me to determine that more than half of the segments generated by memoQ from the clean document are short items (e.g., numbers, section numbers, equation numbers, short mathematical expressions, etc.). That gave me a modicum of peace of mind, but did not address my principal concern at this point in the project, i.e., making sure I make enough progress early on in the project to assure on-time delivery.

You see, given the nature of the beast, progress can't be measured in terms of translating x segments per day, because of the way they're mixed up. As an extreme example, if I decide to address 700 segments per day and run into a section where there's a table with 700 numbers in it, that's not a day's work. The issue is never as clear cut, but I digress...

The solution to my problem is to sort the segments in the other direction, from longest to shortest, and then start translating from the top of the list. This should result in some difficult days up front, where I'm only translating new text, but it will apply my efforts to best effect.

And now, to work!

Cheers...
alexpgp: (Default)
Increasingly, agencies run new documents past their databases of document "segments" (typically, sentences) that have been translated before. Heck, I do this all the time myself, with databases of segments I've translated previously, as that is the entire point of the "translation memory" concept.

The problem arises when they embed found segments in documents sent out for translation and insist on paying a reduced rate for "editing" said translations.

Traditionally, editing a translation pays less than writing the translation to begin with. And that's because of the unspoken assumption that the translation has been competently written.

What is a competent translation?

Well, back when I worked in-house in Houston, and had a voice in defining "competence" in this context, we decided that a translation containing more than one major error per 250 words was poor, and one with more than two major errors per 250 words was unacceptably poor.

If one assumes that an "average" sentence consists of 25 words (an assumption that might be called "conservative," in the engineering sense), that works out to one major error per 10 sentences for a poor translation, and one major error per 5 sentences for an unacceptably poor translation.

So if you consider that a "pretranslated" segment that is less than an 100% match to the corresponding bit of source text is guaranteed to have a major error in it (after all, it's not a 100% match!), and that typically, less than 1 in 5 pretranslated segments are a 100% match, the end result is like editing an utterly and completely incompetent translation, for which one is offered a rate typically offered for editing a competent translation.

So in my current assignment, I'm not only finding the less-than-100% matches requiring attention, but the 100% matches as well. To wit:

{0>
Такая укрупненная частица (флок) осаждается более быстро.
<}100{>
Sedimentation of a large particle, or floc, is faster.
<0}
My version:
Such an enlarged particle (floc) settles more quickly.

{0>
Это происходит за счет совокупного действия нескольких процессов.
<}100{>
The effect can be explained by the following processes.
<0}
My version:
This is due to the combined action of several processes.

{0>
Параметризация турбулентности основана на сформулированном в (Озмидов, 1986) подходе.
<}100{>
The turbulent parameterization is based on the Ozmidov’s approach (1986).
<0}
My version:
Parameterization of turbulence is based on the approach formulated by Ozmidov (1986).

{0>
Турбулентность может быть представлена на основе вложенных вихревых структур разного масштаба, определяемого океаническими процессами.
<}100{>
The turbulence may be presented as the enclosed eddies of a different scale depending on oceanic processes.
<0}
My version:
Turbulence may be depicted on the basis of embedded vortex structures of various scale, determined by oceanic processes.
In addition to being annoying as hell, this kind of stuff slows me down significantly (I had, in fact, expected higher quality and factored it into my turnaround estimate).

Ah, well, there's nothing to be done about it now. I shall have to be more careful in the future, though, about accepting work from this client.

Cheers...

P.S. I repeat myself, I know, but it's therapeutic, okay?
alexpgp: (Default)
I was simply too tired last night to explain why the "- 1" was important in the macro line
myRange.End = myRangeEnd - 1
but not decrementing the value ate a lot of time, so hopefully, this post will reinforce the knowledge gained. (Can you tell I don't want to settle down and translate?)

By the time we get to this point in the code, we've selected the contents of the table cell we're working with and assigned them to a range called myRange. We've also saved the value of the end point in a variable called myRangeEnd.

If the pattern we're looking for is found, the next step after copying the pattern is to shorten the range to exclude the found pattern. To do this, I wrote code to move the start of the range over by the length of the pattern and then set the end of the range to the original end value.

Schematically, the initial range looks like:

it's time for {123}all{124} good men and women to party
S------------------------------------------------------E


After the first pattern is found, the range is:

it's time for {123}all{124} good men and women to party
              S----E


Then we move the start point "over" the end point:

it's time for {123}all{124} good men and women to party
                   S


And we restore the original end point:

it's time for {123}all{124} good men and women to party
                   S-----------------------------------E


Only after instrumenting the code to show my selection did I realize what that did. If you work much with Word tables, have you ever noticed what happens if you select text in a cell and go one character past the end? The contents of the entire cell become selected!

That's what was killing me yesterday: everytime the code was executed, the range being examined was being reset to the original range, and I couldn't figure out why.

Extending the end point to one character shy of "the end," i.e., like this:

it's time for {123}all{124} good men and women to party
                   S----------------------------------E


solved the problem. The next time the pattern is searched for, "{124}" will be found (even if it's at the end of the text).

Apropos of which, the macro works like a charm and actually makes it much easier to deal with the source text. It was a good investment in time.

Cheers...

UPDATE: Wordfast has this concept of a "placeable," which basically is any source text string that's not translatable, such as document designations (e.g., the '50578' in "SSP 50578"). It turns out - I find out by accident - that strings such as "{123}" are treated as placeables as well, and since Wordfast has hotkeys for navigating and copying placeables, it turns out my macro sort of reinvented the wheel. Still, I don't regret the time spent developing the code.

Profile

alexpgp: (Default)
alexpgp

July 2017

S M T W T F S
       1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 272829
3031     

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jul. 28th, 2017 08:40 am
Powered by Dreamwidth Studios