h a l f b a k e r y
Right twice a day.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
or get an account
||This already exists in every spellcheck application. Quite basic to the concept, in fact.
||The examples given will auto-correct on my iOS device (except for "coesn't").
||Well mine soesn't. It just underscores the word, but that gives me an idea...
||There are a few different ways of measuring the
"distance" between one word, and another - common ones
include the Hamming distance which is probably the
basic, and the Levenshtein Distance takes it a stage
further. I don't know what the name is of the thing
you're talking about, which might be like a weighted
Levenshtein, where edits are weighted according to the
physical keyboard-distance between keys, but there are
people out there in google-land who are talking about
||I'm interested in somehow codifying any string into a
single numeric (or if that's not feasable, perhaps a
geometric coordinate tuple) that provides an absolute
measure, when compared against another value of the
similarity or distance of one word to another. Say it
was in 3 dimensions, then "cat" might be at location
(x,y,z : 135.11, 34.12, 890.94), and "orangutan" at
(x,y,z : 11.12, 104.55, 860.76) with a distance of
them - it's unlikely that someone writing one of those
words, actually meant the other.
||Initially, this might be based on a Levenshtein kind
of metric, perhaps augmented by keyboard difference
(though you'd need to factor in the various
differences between international keyboards) but
later, bring synonyms into play, and ultimately to
start using alternative language definitions as well.
||Before you know it - you've got a geometric semantic
map of the whole of language that could be used for
spell-correction, search-engine optimisation and
translation - I rather suspect that there are various
implementations of such a geometric semantic space
floating out there looking at the interwarbs as we
||Spell checkers nowadays will correct for this sort of
error. The iPhone and OS X both have this sort of
correction built in, and I can only presume that other
systems have it as well. Those systems fail though
when the mistyped word is also a real word. The
linked idea is my proposal to solve this problem.