Aug. 1st, 2003

alexpgp: (Aura)
A couple of recent news items make it appropriate, I think, to revisit a post from nearly two years ago, on the nature of machine translation.

The first news item was a press release from the University of Southern California, titled Romancing the Rosetta Stone. The piece briefly describes a technique for machine translation that does not rely on the traditional method that, to oversimplify its description, consists of decomposing the source text, analyzing the parts, finding the analogous parts in the target language, and reassembling a target text. Instead, the new technique analyzes the way words stand in relation to one another in existing parallel source and target texts. (I can't help but think of this as a form of neural networking applied to text.)

For me one of the money quotes from the press release is the following:
Och is a standout exponent of [this new] method of using computers to translate one language into another that has become more successful in recent years as the ability of computers to handle large bodies of information has grown, and the volume of text and matched translations in digital form has exploded, on (for example) multilingual newspaper or government web sites.
[Emphasis mine]
The other news story, in the New York Times [Registration required] (From Uzbek to Klingon, the Machine Cracks the Code) also mentions USC and says pretty much the same things, including the following:
Although in one sense it was more economical, [statistical] machine translation was also much more complex, requiring powerful computers and software that did not exist for most of the 90's.
[Emphasis mine]
Why did I emphasize the text in italics? Let me back up a bit...

Almost universally speaking, translators don't think very highly of machine translation, and for good reason. It's because, almost universally speaking, machine translation is not just bad, but spectacularly so, and nearly every translator I know has a personal anecdote to relate in this regard.

My own FMTH (Favorite Machine Translation Howler) occurred when I encountered the following line in a Russian source document, describing the last line of an address in the U.S. (city changed to protect the innocent):
BEVERLY HILLS ПРИБЛИЗИТЕЛЬНО 90210
The appearance of ПРИБЛИЗИТЕЛЬНО (which means "approximately") threw me. An address line like that should have said CA, or at worst CALIFORNIA. I was puzzled for a moment, and then it hit me: if you squint real hard, CA could be interpreted as the (capitalized) abbreviated form of 'circa' (that wonderful word that, for example, tour guides in European art museums use when describing dates that are approximate).

Ye gods.

Somehow, the address line was inserted in the source document, along with some other bloopers, and the end product obviously was never checked afterward.

While disdain for machine translation is nearly universal among working translators, many allow that emotion to extend to the future of machine translation as well. In my opinion, such confidence in the ultimate failure of any attempt to achieve quality machine translation is ill-placed.

Strange as it may seem, I believe it is ill-placed on the basis of what happened to the game of chess.

Back in the late 1940s, the first description of how a computer might be programmed to play chess was published by Claude Shannon. Nobody took the idea seriously, especially since the problem appeared to require resources far in excess of what was commonly available to researchers. In subsequent decades, prototype chess-playing programs were written purely for research purposes (which is to say they all played horrible chess).

In the early 70s, however, some students at MIT entered a program of theirs, named MacHack IV, in an amateur tournament sponsored by the United States Chess Federation. The program scored one draw, after having lost its first five games in a tournament format that, generally speaking, pits players against others with the same score on a round-by-round basis. (In other words, in the sixth round, the program faced a player who likewise had lost the five previous games.)

The program's provisional rating was somewhere in the low 1200s, on a scale that ranges (practically speaking) from about 800 for people who barely know the moves, to 2800 or more for grandmaster-level players. To give you an idea of what a rating difference means, consider that a 400-point advantage over an opponent means you can expect to win against that opponent about 95% of the time, all things being equal.

MacHack IV may have been best-of-breed in the early 70s, but it was far from being in a position to threaten human dominance of the game. Suggestions to the effect that someday, computers might play at a master level were generally greeted with the kind of polite skepticism one generally reserves for people who believe the earth is flat. And the idea of a computer World Champion made flat earth theories sound plausible by comparison.

A couple of years ago, however, a program running on a massively parallel special-purpose computer did beat the World Champion in match play, albeit not for any official title. Moreover, anyone can go down to their local CompUSA and pick up a software package that will consistently beat all but the top few percent of players in the world. The whole process, from the intial tentative description of how it might be done to the triumph of etched silicon and electrical pulses over human wetware took just about 50 years.

But, you say, translation is not chess?

I agree, but the fact of the matter is this: as computer systems become more sophisticated, their ability to undertake tasks of previously mind-boggling complexity increases to the point where the complexity can be tamed. When you combine the taming of complexity with the fact that humans are very adept at achieving goals believed to be impossible in some sort of key way, the result is dynamite. Or a chess-playing computer. Or an airplane.

In this centennial year of aviation, we should keep in mind that many of the best minds of just over a century ago considered heavier-than-air flight to be flat-out impossible. Moreover, the leading lights of the following generation believed the sound barrier to be impregnable, and some of their intellectual heirs were convinced space travel was impossible because the vacuum of space offered nothing to "push against."

In the end, the history of science tells us that the smart money doesn't often win by betting on something being impossible.

So does this mean I believe the systems touted in these articles are the Real McCoy, the killer translation application that will relegate professional translators to the unemployment line? No, not at all. If you read the articles carefully, no claims along those lines are being made, at least not explicitly. The closest anyone came to talking about the quality of their output was the following, from the Times article:
Dr. Knight and others said the progress and accuracy of statistical machine translation had recently surpassed that of the traditional machine translation programs used by Web sites like Yahoo and BabelFish.
That being the case, translators should have no immediate worries, since Web translations tend to be painfully bad, and nearly useless even for "gisting" (i.e., translations that are admittedly far from perfect, but good enough to give the reader a rough idea, or the gist, of the text's content).

Statistical translation is something to keep one's eye upon. We live in interesting times.

Cheers...
alexpgp: (Default)
The secret to getting a solid block of sleep in the middle of the day (more or less) is apparently to get good and tired. (Duh!)

I went to sleep shortly before 3 pm yesterday and although I woke up twice, I had no trouble falling back asleep nearly immediately. So when I finally did get up, I had about 6 hours of shuteye under my belt.

Having said that, I've got to mention that this magical dawn-ish period is still a yawn-fest. And having the thermostats here in the MCC set to somewhere just this side of freezing doesn't help much, it seems, although I'm told the hardware around here really digs the frigid ambient temperatures.

Time to get some coffee, methinks.

Cheers...
alexpgp: (Default)
I have begun to notice a disturbing trend over the past week or ten days.

The amount of spam showing up across the board - in my personal account, my work account, and my web-based e-mail - is bloody well going exponential, to the point where I actually have to make an effort to make sure I don't accidentally delete something that is really mail.

My relief just showed up... time to make like a tree.

Cheers...

UPDATE: For the heck of it, I counted the spam received since 10 pm last night (it's 2:15 pm right now, so that's about 16 hours), I've received 77 spams just in my personal e-mail. Blyech.
alexpgp: (Corfu!)
My anger at Citibank found a release today, when I decided to prematurely cash in one of my IRAs, which had been sitting there not doing all that much (I pulled out of the stock market long ago, except for some positions that had folded and I hadn't gotten around to disposing of, so the cash was earning something like 2% in a money market fund).

So despite the tax penalty, I figure that I'm better off eating the penalty and taxes so I can pay off the scrofulous cretins at Citibank, for two reasons. First, once I'm paid off, their fangs will disappear from my neck, thus stemming the long-term bloodletting that has been so profitable for them in the past. This goose is through laying eggs for them. Second, I can use the money that would've gone to pay their ever-escalating demands for more and more money (paid as interest and not as principal) to reduce my other debt that much faster.

May the goddess of banking fortune smile on Citibank's competitors.

Cheers...

Profile

alexpgp: (Default)
alexpgp

January 2018

S M T W T F S
  1 2 3456
7 8910111213
14 15 16 17181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 14th, 2025 01:13 pm
Powered by Dreamwidth Studios