Halfbakery: Babelshark

Product: Translator
Babelshark (+11) [vote for, against]
Advanced Babelfish-type translation

We all know the limitations of Babelfish. And for someone who knows nothing about the target language, I'm afraid this proposal is unlikely to help. But if you have some working knowledge of the target language, and would like help in converting the clumsy first pass result into something more accurate and idiomatically natural, I propose the following:

Solicit from publishers the electronic texts of a well-chosen selection of books that have recently been published in multiple languages. These translations will have been done by real human experts, and the translations will be as accurate and idiomatic as you hope your ultimate translation will be.

Now when you do the automated translation of your text, you start with the conventional Babelfish-style approach for the first pass. For the second, advanced pass, you highlight areas that were idiomatic in the original, and are thus likely to be wrong in the translation, and since you know something about the target language, you can highlight sections of the first-pass translation that seem suspect to you, even if you dont know what's right. After highlighting the suspect sections, the software parses the text from the book translations to find passages that correspond to the selection in your untranslated text, and then proposes the corresponding phrase in the book translation. You'd be offered the option of seeing the book passage containing the phrase, to judge its appropriateness in context. Again, knowing enough about the target language, you could in most cases determine whether the suggested changes would improve the translation, and either accept the proposed change or ask for another proposal from the translation engine.

The company offering the automated translation service would have to assure the publishers that the book texts would never be made available as entire books, or even in blocks of more than, say, twenty words at a time. And they'd probably have to offer some small fee for the book files, or offer advertisements for the books on the translation site. But since the translations are already done, and worth nothing now to publishers once the book is published, and are already in electronic form, it should cost very little to build a fairly comprehensive database of reliable idiomatic and somewhat context-sensitive translations.

For even better results, perhaps the user could be given the choice to filter the types of books searched (e.g., business texts rather than romance novels, books written in the last two years rather than in the last twenty years, etc.).

I'm not really sure how effective this would be, but surely it would offer some improvement. I am fairly sure it would still be less than ideal, but then, I didn't name it Babelporpoise, either.
-- beauxeault, Aug 16 2002

Google Translate http://translate.google.com/translate_t
"Suggest a better translation" button appearing after translations [id3as, Aug 13 2007]

Wikipedia: Statistical machine translation http://en.wikipedia...machine_translation
What this idea is normally called. [jutta, Aug 16 2007]

And the porpoise of this idea is....

I don't know if it's better to be a big shark in a little ocean, or a little fish in a big ocean. But either way, it gets lost in the translation.
-- polartomato, Aug 16 2002

Why keep rescanning the books? Just index them properly for different translations of the same patch of text, and allow the reader to choose the one that makes the most sense in the context. Etre ou pas être, and all that.

[Obvious flaw: with a really well translated piece, you will probably not be able to automatically map a sentence in one language to a sentence in the other.]
-- DrCurry, Aug 16 2002

In Babel Fish, if one translates "Babel Fish" from English to Italian to English, the result is: "De Fish Confusion"
-- thumbwax, Aug 16 2002

Dr. Curry, I thought about indexing; maybe that is a better way. This is part of the area where I'm not so sure which problems outweigh others. But the flaw you mention is part of the reason I didn't mention indexing. What granularity is most appropriate? Surely it varies from one idiom to the next, and as you suggest, even from one translation to the next. And I'm not so sure that idiomatic phrases are so uniform from one person to another or even from one context to another to be able to build a useful index. Maybe a "smart index," which the software builds from the selections made by users?
-- beauxeault, Aug 20 2002

I just tried the Google Translate link, and it rendered "My underpants are on fire" in Italian as "I miei underpants sono su fuoco". I just don't believe that the Italian for "underpants" is "underpants", yet the promised "Suggest a better translation" button did not materialize.
-- MaxwellBuchanan, Aug 13 2007

I love these programs. I just entered the sentence "Insert the stainless steel screw into the third hole from the left, and tighten until firm"

then translate from english ->french ->german ->english, I get

"The screw of rustproof steel in the third hole the left one insert and press, to society"

...and hillarity ensues. I think this is how assembly instructions for toys are written.
-- Custardguts, Aug 16 2007

As a professional technical author, low-grade polyglot and occasional user of translation services both automated and human, I really appreciate this suggestion. It has really serious merit and should be used (and paid handsomely for) by the writers of translation-assist software ASAP. I'd give it at least a dozen croissants if I was allowed to.

The precise method of implementation (to index or not to index, that is the question) I'd leave to the software experts - but I suspect they'd index. The index would probably be an order of magnitude (or two) bigger than the original texts, but with modern hardware, who cares?
-- Cosh i Pi, Aug 16 2007

random, halfbakery