Apr. 3rd, 2006

alexpgp: (Default)
Ever since I downloaded and installed a trial version of a PDF reader/writer, the Windows Installer has been cropping up, trying to reinstall that application every time I start Word. This morning, I decided to see if there was anything Out There™ that might help eliminate this annoyance.

You know, of course, that dealing with annoyances is almost an industry unto itself? At Annoyances.org, I found an article that addressed my problem, albeit for Office 2000.

Being the adventurous soul that I am, I applied the solution to Office 3000, and it worked. (As far as I can tell, the solution apparently causes Windows Installer to fire up to take care of the missing registry value, pre-empting it attempting to run to re-install the already installed application.)

In other news, I need to apply shoulder to wheel fastest, as not only do I have a pile of work, but Drew wants a day off before I go on my assignment to the Edge of the World™ and Galina goes to Houston.

Cheers...

UPDATE: Hmmm. This seems to take care of Word, but the farblegargling installer went to work again when I started PowerPoint, and once more when I "cut" a file (using Ctrl+X) in preparation for moving it to another directory. This is getting old.
alexpgp: (St Jerome a)
The recently installed PDF package seemed to arrive at just the right time, as I had received a fairly large PDF (~50 pp.) that could not be read by FineReader. What's more, the new package claimed to allow the user to save the PDF in a variety of formats, including Word .doc files.

This sort of hinted at the ability to do OCR (although at this point, there's nothing "optical" in the technology, but I digress...). That ability is there, sort of.

The "clean" pages look pretty good, although both FineReader and the new program (ScanSoft PDF Professional) have a nasty habit of trying to create formatting that isn't there, and which can't be edited with any ease, which basically means such formatted information is useless.

But in doing a spell check of the Russian, I started to notice that words such as "специального" would be highlighted as incorrect, with "специального" offered as a replacement.

No, your eyes do not deceive you; the words appear identical. This kind of behavior is a dead giveaway of a phenomenon where one or more of the homoform letters (e.g., "a", "c", "e", and some others) in a word is in the "other" language.

I quickly cobbled together a macro to highlight  all such letters and was not pleased to see the result. Somewhere in the recognition algorithm, there needs to be a step whose simplified pseudocode might read: "If the immediately preceding character is in language X, and the immediately following character is in language X, the character being processed is in language X."

This robotics text is turning out to be a real bear, OCR notwithstanding.

Cheers...

UPDATE: The damage is worse than I thought. Almost all instances of the Russian "В" ("Veh") were rendered as "B" (as in Bravo), and several Russian letters "З" ("Zeh") were rendered as "3" (three). Of course, I suppose having this kind of result is better than no result at all.

Profile

alexpgp: (Default)
alexpgp

January 2018

S M T W T F S
  1 2 3456
7 8910111213
14 15 16 17181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 10th, 2025 04:52 am
Powered by Dreamwidth Studios