Aug. 4th, 2011

alexpgp: (Default)
The 40K Job turns out to have a source document with something like 51,500 words in it, not the advertised 39,000.

Now, normally, I would not mess with a source document when a document containing pretranslated segments is handed to me, but upon importing the file with pretranslated segments into memoQ, I find that only the pretranslated segments were imported.

This has a good side and a bad side.

The good side is that I can take care of editing all of the pretranslated segments in "one swell foop," and once that's done, I really don't have to worry about them any more (they're stored in the "translation memory" file) and I can concentrate on translating the new material.

The bad side is that I still have to figure out a way to import the rest of the document into the application. That's why I started working with the original, pre-insertion-of-pretranslated-segments document.

Which was a big mistake.

It turns out the document is riddled with what may be described as "typewriter style" tables, in which the table is rendered in monospaced Courier, so that each line of type contains information from each column, filled in with spaces to maintain alignment of the graphics characters (vertical and horizontal bars, corners, intersections of various kinds).

Thus, the information in a table "cell" may be spread out among several lines, with no way—short of a laborious cut-and-paste session—of extracting what is needed.

Well, it turns out the client already did all that laborious grunt-work so that I may have usable tables in the version of the file that contains pretranslated segments. That means I need to use that file, even if memoQ doesn't seem to want to cooperate.

I'm sure there's an elegant way to do it, but then it hit me: I can easily "restore" the pretranslated file so it doesn't contain any markup or translations, whereupon I feed the "restored" file into memoQ and... bingo! We are cooking with gas!

You see, the edited pretranslated segments are in the database and will be filled in when the associated source text is encountered. But a couple of items still bother me:

(1) There are actually 45,000 words in the job, not 39,000. Both good (bigger payday) and bad (more work over the same time).
(2) The memoQ software shows the new file to have about 50% more segments than the file I fed it yesterday. Assuming I have 14 more working days after today to work on this (which gives me two days of grace for the drive to Colorado), that means I have to process an average of 650 segments per day, minus what I've done so far.

I need to think about that second point and what it means. Meanwhile, break's over. Back to work!

Cheers...

P.S. Recalled a trick that LJ friend (and fellow memoQ user) [livejournal.com profile] velvet_granat taught me. I sorted the segments by length and looked at the shortest segments. There's a powerful lot of them. In fact, more than half of the 9300 segments are super-short. Again, good and bad to that, but explanations will have to wait. I have to get some work done!

Profile

alexpgp: (Default)
alexpgp

January 2018

S M T W T F S
  1 2 3456
7 8910111213
14 15 16 17181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 10th, 2025 07:44 pm
Powered by Dreamwidth Studios