Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
It's the thought that counts.

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.



Please log in.
Before you can vote, you need to register. Please log in or create an account.


Idiom and Concept Markup Language
  (+8, -7)
(+8, -7)
  [vote for,

HTML tags are (theoretically) rendered in the appropriate way by the type of user agent on which they are displayed. Thus, the <EM> tag might render the subsequent text as bold on a text-based agent, as 'loud' on a speaking agent, and as 'glowing' on an 'attention-seeking graphics' agent. Similarly, <HR> does not actually *insert* a rule, it indicates to the user agent that whatever that agent uses to *represent* a rule should be inserted.
ICML is an extension of this notion to idioms and concepts. Each idiom that is included in the specification is coded into its ICML entity by the ICML generator, and decoded by the user agent as appropriate.
This is more than just AutoCorrect or a translator. When I type 'bonnet' or 'boot' (of a car) into my UK-configured ICML generator, it produces an entity representing that concept. A US-configured ICML browser would render that entity as 'hood' or 'trunk', while a UK-configured browser would render it as 'bonnet' or 'boot'. Furthermore, typing 'shed' (the place in the home where Australians store their collection of whatever they collect, hang out drinking beer, and generally laze around) into an Australian generator inserts an entity which renders in a US agent as 'den' and in a UK agent as 'basement'. User options would be available in the browser for those whose vocabulary is cross-cultural.
angel, May 24 2001

Universal Chinese http://www.halfbake...Universal_20Chinese
I had broader ideals for technical communication (and therefor further to fall). angel - i'd be interested to see if there's anything to be gained by cross-pollinating these two ideas? [st3f, May 24 2001, last modified Oct 05 2004]

xml ai spec http://www.halfbake...dea/xml_20ai_20spec
Not quite the same. [egnor, May 24 2001, last modified Oct 05 2004]

(?) Universal Networking Language - UNL http://www.ias.unu....twork_language.html
A project of the United Nations University. Apparently defunct, or else they don't update their Web pages very often. [egnor, May 24 2001, last modified Oct 21 2004]

(?) Main UNL site. http://www.unl.ias.unu.edu/
But can they convert MS WORD to plain text? [jutta, May 24 2001]

auto conversion tool_3a_20auto_20conversion
similar idea, but only conversion of numbers, only at the Halfbakery [rrr, Feb 23 2005]


       I like the intent of this one...I'll leave the troubleshooting to egnor.
iuvare, May 24 2001

       Sounds like an old academic WIBNI, an automatic translation system with complete understanding of both the input text and the output language.
jutta, May 24 2001

       I feel this of limited use, but you get still get my vote. I believe that if you replace an idiom with a similar idiom from another culture that you lose much of the meaning.   

       Take for example the US idiom "broad" [Slang. A woman or girl: “I use ‘broad’ as a moniker of respect for a woman who [knows] how to throw a mean right” (James Wolcott). - Source: The American Heritage® Dictionary of the English Language, Third Edition]. How would you translate that to a UK idiom keeping the nuances and cultural references? I believe you can't.   

       So, bonnet/hood - yes, basement/den/shed, no.   

       <caveat>I would only ever suport this change if it were fully in the control of the author. To have the words I chose automatically replaced makes my skin crawl.</caveat>   

       So where would I use this, and why aren't I fishboning it? The answer lies in technical authoring - the production of a user guide, manual or other technical guide. When writing, the author tries to avoid idioms as much as possible but it would be useful to be able to tag words as appearing differently depending on the locality in which the document is to be rendered.   

       This is baked for hard copy (FrameMaker has done it for years) and only a little toasty for web (You could create a cookie for locality and either run your changes client-side with JavaScript or server-side with the language of your choice). There's no common standard for doing this and an XML solution would (I think) certainly be bakable.
st3f, May 24 2001

       [st3f]: Your post does indeed have similarities in intent.
I'm not sure now whether I intended this to be a product or a philosophical discussion. (Either way, please excuse any trivial elements, as I am an expert in neither field, but if I'm talking rubbish, I'd like to know.) It was partly inspired by an ongoing debate years ago about translating programs written in Pascal into BASIC. You don't substitute each Pascal command with its BASIC equivalent (partly because, as noted in the C++ to Java dictionary idea, there often isn't a one-to-one mapping), you figure out what the program is trying to do, and do it in BASIC. In other words, you translate the intent, not the words. There are also associations with sign language, although I know insufficient about this to elaborate.
Idiom translation would require that the idiom exists in both cultures, so 'bonnet'/'hood' is fine and 'state school'/'public school' or 'public school'/'private school' might be, but I don't think there is an equivalent of 'broad' in British culture. (Note that I use 'culture' rather than 'language'.)
I suppose the 'concept' part is more a philosophical thing than a technical one, in that words are just the things we have to use to express concepts. If I use the concept 'Skoda', I mean to imply 'slow, ugly, horrid, unreliable, naff car' (in popular belief, if not in actuality), which would 'translate' into American as 'Chevy Nova' or some such. Similarly, if I mean to refer not to a *particular* girls' school but to the general notion of one, I would type 'Benenden' and a US reader would see 'Bryn Mawr'.
[jutta]: If this is getting too extensive/irrelevant, please throw me out.
angel, May 24 2001

       UB, waugs: Although angel doesn't explicitly say so I think that you would have to tag this explicitly.   

       So, 'boot' stays as 'boot' wheras <rearluggagespaceofcar/> becomes 'boot' or 'trunk'.   

       The same misconception was applied to niall's 'Missing attachment reminder' and to my 'Flexicaps'. I, for one, felt grieviously wronged (not to mention highly fished).
st3f, May 24 2001

       Translating individual words or phrases would only work for the most basic, simple translation tasks; it's well that you chose two dialects of English for your examples. Any "real" translation task (English to Spanish, say, let alone English to Mandarin) requires a much deeper semantic understanding of the material being translated.   

       As others have noted, the ambiguity in natural language means the human user will have to enter these tags manually. No program could hope to mark up a document containing words like "bonnet" and "hood" properly without the aforementioned deep semantic understanding, so the user would have to do it.   

       That means that the user will have to somehow know, of the hundreds of thousands of words and idioms in their lexicon, which particular ones are different in other dialects and therefore worth representing as ICML tags.   

       This seems to be reducing to the sort of thing [st3f] is talking about -- a simple markup language that lets professional writers indicate where particular locale-specific substitutions should take place in their document. You might as well give full control over to the author; rather than using <rearluggagespaceofcar/>, you might as well just type <american>trunk</american> <british>boot</british> and just tell the system to turn on all the "american" or "british" tags. This is just #ifdef for text -- and a far cry from the grand scope of your vision.   

       (Pedantic note: The trunk of a car can easily be in the front, if the engine is in the back. Is the same true of the boot?)
egnor, May 24 2001

       The difference between <rearluggagespaceofcar/> and <american>trunk</american> is that with the former, you don't have to analyze the surrounding text to tell whether the writer might be talking about elephants, trees, or swimwear.   

       Angel is trying to build a completely homonym-free language to express precise meanings of words.   

       In the extreme, that's almost impossible; the representation of precise meanings would be very large, and next to useless.   

       Each tag we'd get back from angel's ICML generator would be very large, because it would have to contain the "language environment" the word is in: the other words that sound like it, the historical context in which it develops; the social situations where it would be used; famous occasions it was used at; which people use it. Practically a whole AI expert system for each word!   

       But even if we had these big descriptions, we couldn't fully express them in another language - strictly speaking, even people speaking in the same language never fully understand each other, something always gets lost, because we don't have the same backgrounds. We can only communicate because we can handle these losses.   

       So translating things is all about making compromises and losing information - figuring out which details are important in their context, and leaving out those that aren't. Having precise meanings helps clarify a writer's intention, but that still doesn't help you find a match where no right match exists.
jutta, May 24 2001

       Best of luck to them, but I'm dubious. Perhaps it would work well enough to make documents which could be successfully interpreted with difficulty after translation. Without full-on AI, I'm hard pressed to imagine that it would actually produce anything resembling normal text.
egnor, May 26 2001

       Ravenswood, got any links/details to back you up?   

       I know that the UN has a computer-assisted translation effort, and I know that computer-assisted translation systems tend to allow for human mark-up that clarifies confusing cases, but going abstract all the way would surprise me.   

       [Looking at UNL] Well, whaddayaknow...
This is pretty much exactly what [angel] is describing, run in earnest by a few hundred linguists. Check out the powerpoint presentations on the main site for a flavor of what the language is trying to do.
jutta, May 26 2001

       See link.
egnor, May 26 2001

       Ich verstehe Bahnhof. So, with everything that's been said, what's a 'Fish-ism' (in ten word or less) to the citizens of Nizhny Novgorod, Russia? Me thinks it parellels the UN's Universal Networking Language; if we could do it, we would; since we can't, we won't--but you don't get any of your funding back.
Grog, Sep 17 2002

       Let's restrict it only to the interpretations of numbers (see link). An author writes about dollars but a European reader reads the amount in euros. Or feet and metres. Only if the author tags it such. In quite a few cases you do not want to have the amount converted, the author is the best judge.
rrr, Feb 23 2005


back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle