Evolving Translation Machine

A sort of WikiBabelfish
Babelfish scores very high in the usefulness ratings, but pretty poorly for accuracy.

Wikipedia scores highly on both usefulness and accuracy - the accuracy relying on hundreds of thousands of humans checking the database against their real-world knowledge.

So let's put the two technologies together.

When I read a Babelfished page, I'm basically doing half of the translation myself anyway. Babelfish has made a good start on it by translating most of the words, generally in the correct tense and sometimes with a nod to the correct grammar, I then complete the translation in my head by using:

a) a knowledge of English, as she should be spoke
b) a knowledge of the context - for instance "fil" would translate from French as "string" in the context of an article about knots, but as "wire" in an article about electronics
c) some basic knowledge of French / Italian / Whatever

The Babelfished version should open in an editable window which will allow me to rewrite some or all of it and also tag it to give the machine an idea of context.

The edits will then be submitted back to the translation site which will run it again through several mutated variants of the original translation engine. The mutation that produces the closest result to my submission will then be used for all future translations.

The crucial point here is that the although the machine is not learning from a professional translator, it is learning from people with a good idea of what the finished translation should be, based on the rough translation, subject knowledge and knowledge of the output language. And hopefully it will be learning off thousands of corrections a day.

This idea makes a number of assumptions that seem to hold true for wikipedia - eg that people will voluntarily donate their time to the project and that the majority of people will not deliberately feed in false data. It also makes some assumptions about the ability of translation engines (which are horrifically complex) to mutate without breaking.

wagster, Nov 29 2009

       I like.   

       Only one concern, though. Is there an engine powerful enough to learn from your feedback? Presumably, the existing translation software is quite complex, because it does make a faint stab at context.   

       So, how does the software extract information from your translation which it can then generalize to use in other translations?   

       But [+].
MaxwellBuchanan, Nov 29 2009

       It doesn't really need to extract information as such, it just has to compare one piece of text with several others and score them on similarity. The machine isn't really learning anyway, it's just breeding a number of mutant offspring, getting them to make the translation, and killing off all but the one that scores highest on similarity to the human-edited text. It's a blind watchmaker.   

       My main concern is that code being different to living organisms, mutations might always regress.
wagster, Nov 30 2009

       Actually, couldn't you do some kind of mashup of Wikipedia? When i'm stuck for a word, i frequently look it up there, then go to the corresponding entry for the language i'm trying to use. Were that automated, it might work quite well, and the updating would occur without anyone needing to do it specifically for the translator.
nineteenthly, Nov 30 2009

       Is that much more useful than a dictionary, which all translators have already? I might try it next time I'm stuck and see if it's better.
wagster, Nov 30 2009

       Ian, what's the connection between your link to WIkipedia's explanation of the "Google bomb" concept and this idea?
jutta, Nov 30 2009

       I think Ian is suggesting that such a machine could be hijacked by a coordinated group of pranksters. Spanish-->English: "me gusta enanos con queso" Google: "george bush is a big fat idiot"
swimswim, Nov 30 2009

       Thanks [swimswim] - looks like the google approach is pretty similar: "we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model"
wagster, Nov 30 2009

       jutta, - My nipples explode with delight!
Ian Tindale, Nov 30 2009

       [Wagster], one advantage is that rather than just having a single word, you have context, so for example you don't end up translating the Danish "frø" to mean "seed" when you should be using "frog" or vice versa (there is a gender difference there though). I've found it very useful.
nineteenthly, Nov 30 2009

       Right then, from now on I shall be using Google Translate instæd. Or Ian's Hungarian phrasebook...
wagster, Nov 30 2009

       How close is my Wiki Distributed Translation idea to this one? It seems very close to me.
bungston, Nov 30 2009

       Having read all of it, it's fairly similar [bung], except for the Darwinian aspect. I must confess I only read the first half of your idea before posting.
wagster, Nov 30 2009

       Sounds good, except for the fact a large group of people could screw with the system by submitting false translations back to the engine.
William II, Jun 29 2010


