Dec. 1st, 2001

alexpgp: (Default)
I finished the first assignment and am now doing a slow burn with the second.

The document I was sent mixes UTF-8 characters with non-UTF-8 characters. Furthermore, I need to grok the contents of some glossary files that were sent to me, in a variety of formats.

The .PDF glossaries gave up their secrets with little protest. So far so good. Then I ran into the Unicode monster, again.

There are apps out there that understand Unicode (UTF-8) and those that don't. There was a time when you could use some smart tricks with Microsoft "productivity" apps to convert Russian text expressed in Unicode into non-Unicode text.

No more, apparently.

My client sent me an - urk! - Excel file with 2362 terms in it. The fact that it is alphabetized for English terms is easily fixed. The part that's not easily fixed is the fact that Excel does a marginally passable job only as a spreadsheet, not as something you want to search through for terms, or print from.

Ye gods. No matter what I do - and I am far from a tyro in such matters - Excel wants to print the English terms out on one set of pages, and the Russian terms out on another set of pages. Apparently, both columns won't print out on the same page (and there are actually four columns, but who needs abbreviations?)

Wait! I've got it! If I reduce the font size to 6 point, I can get two columns to print on a page. Progress! Yes, by Jove, that's PROGRESS! (Now, if only I could remember where I stowed my magnifying glass...)

Grr. This is rapidly ceasing to be amusing.

So here I am, trying to export the text as text, but Excel exports UTF-8 symbols uniformly as '?', and there's not even an option to save the entries in an RTF file (not that it would make sense, but still).

And all this is in addition to the "Out of Memory" occurrences that are becoming about as common on my VAIO as snowflakes on Pagosa Peak. It seems as if suddenly (within the past couple of weeks), Windows Me really wants me to use just one application at a time. There must be one hell of a memory hog among the apps that I use regularly... I'll put my money (sadly) on Mozilla.

Anyway, it's back to the face of the salt mine (though it's probably time to go pick up Galina from the store).

Cheers...
alexpgp: (Default)
I broke out an old copy of Vern Buerg's LIST utility and began to analyze the structure of the Excel file that contained the terminology I need.

I finally doped out how (most) of the entries are stored, and it's really pretty straightforward to write a program to extract them (I did, using Turbo C++), but I'm obviously missing something, because every once in a while, the text goes blooey! and I end up with garbage for some number of entries until things synch back up.

Embedding a glossary in Excel is stupid! It's like keeping track of your checkbook using Word.

And I am still on page 4 of 25, due tomorrow evening. Criminy.

Cheers...
alexpgp: (Default)
Okay. I have some minor-league crow to eat. One can copy the offending text from Excel and paste it into Word.

Been there, done that.

What I hadn't noticed before just a few minutes ago is the ability to save the Word file as encoded text.

So now I have this 130K text file that needs a little reformatting and voilà!. I'm in business!

It has been a long several days. And tomorrow holds out promise for nothing but more of the same. Ah, well. That's the price one pays, I suppose, to live up here.

Cheers...

Oh-tay!

Dec. 1st, 2001 09:53 pm
alexpgp: (Default)
So there we were, dead in the water, ten miles off Shanghai...

Whoops. Wrong war story.

While Word did export the four-column table pasted from Excel using the correct character encoding, for some reason, it insisted on randomly missing newline characters for some entries where there were no English or Russian abbreviations.

So, between having the files open in TextPad and running a home-rolled Perl script, I finally got over that little obstacle one missing newline character at a time. Fooey.

Next, I took a look at the sparse SDLX docs to understand how their TermBase terminology database is put together. The explanation they provide in their documentation assumes you already know what the heck they're talking about, and there is no actual example to work with.

So I basically started to push and shove my way through the software. Eventually, it took me three tries to set up a database that would do the trick. Once you understand what the heck the designers kept dancing around in their "documentation," it's actually pretty neat. (What I don't understand is how I came to understand it, but that must be the whisky talking.)

In the end, you can engage TermBase while in the main SDLX editor, and when Russian words are in the nominative case, pressing Ctrl-F8 will display all the terms in the source text, along with their translations. It is truly an awesome sight to see. The whole shebang is slicker than owl snot on a doorknob (however slick that may be).

However, that nominative case thing is a slight problem, since the Russian language is not optimized to use words in the nominative case in sentences unless there is a darned good reason for doing so.

It's not that SDLX has some crush on the nominative case per se, it's just that it's matching engine is designed to do an exact match between one's source text and a term in the database, and in Russian, unless one is a weirdo, one's terminology database will contain terms in... the nominative case.

I'm babbling, aren't I?

In any event, the text of this job is actually not all that difficult. It's just put together sloppily (whoever wrote it really had their mind on TGIF, or had a bet going as to how many grammatical errors one could put in a specification and not have anyone really notice).

Then again, most of it is specification-ese, as in:
Нагрузки окружающей среды, действующие на опорное основание, включаются в модель опорного основания и указаны в разделе 3.2 данной Книги.

Environmental loads acting on the substructure are included in the substructure model and are noted in section 3.2 of this Book.
Yikes-a-rama.

It's definitely time to go to sleep. I know I can wrap this beast tomorrow.

Cheers...

Profile

alexpgp: (Default)
alexpgp

January 2018

S M T W T F S
  1 2 3456
7 8910111213
14 15 16 17181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 17th, 2025 03:09 pm
Powered by Dreamwidth Studios