h a l f b a k e r y"It would work, if you can find alternatives to each of the steps involved in this process."
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
browse anonymously,
or get an account
and write.
register,
|
|
|
Please log in.
Before you can vote, you need to register.
Please log in or create an account.
|
Consider this typo:
"ince"
Spelling correction suggestions could be "Inch, Inca, Since, Nice, Once, etc..."
I propose a keyboard-proximity filter (based on the system's current designated keyboard layout) to order them in the most likely typo order. O is very close to I on the keyboard, so:
Since,
Nice, Once are probably the most likely words.
Inca and Inch are much less likely to be typed by accident, and accidents form 90%+ of my spelling errors.
Add to that theme with some intra-sentence grammar checking and common word tagging, and spellchecking could be much more useful.
(How often does one really intend to use the words "tot he" in a sentence? it is more likely to be "to the", for example. Some systems auto-correct that automatically, though.)
US Patent 6,801,190
http://www.google.c...AAAAEBAJ&dq=6801190 [jutta, Aug 17 2009]
Context-sensitive spell check in Microsoft Office 2007
http://blogs.msdn.c...6/06/05/617653.aspx [jutta, Aug 17 2009]
Context-sensitive spell check in Google Wave
http://googlesystem...-spell-checker.html [jutta, Aug 17 2009]
Wikipedia: Damerau-Levenshtein distance
http://en.wikipedia...evenshtein_distance Edit distance with bells on. [jutta, Aug 17 2009]
Wikipedia: Needleman-Wunsch algorithm
http://en.wikipedia...an-Wunsch_algorithm This very clearly needs to be worked into a popular "dance craze" song, "Do the Levenshtein-Damerau Needleman-Wunsch". [jutta, Aug 17 2009]
[link]
|
|
All good ideas, and patented and implemented in a few systems. A patent- or literature-search for "spell checking algorithms" might be in order. |
|
|
The keyboard proximity thing is implemented, if one bothers with it, as a "confusion matrix" that, given two keys, tells you how likely they are to be confused. When computing the edit distance between two words (-> Levenshtein distance), instead of assigning equal probability for each substitution error, the confusion matrix is used to look up the possibility of this specific error. |
|
|
Well, duh... plainly obvious you mean Vince |
|
|
//Levenshtein// - so that's what it's called - I once wrote a program that was intended to act as an "engine" for ALL card games, from snap through Gin Rummy to any/all variants of Poker, with each ruleset defined as a (relatively easy to edit) xml file - the tricky part came during draw/replace scenarios, trying to get the machine to try to decide whether it had a good/bad enough hand to draw a card (and decide which one to burn in the process), and there are lots of routines that reference the Hamming distance between a given hand, and a target one (e.g. four of a kind, or a series of hearts, or a numeric sequence) that the program might have "wanted" - I'm now going to have to go back and rename some of my methods to usd the word "Levenshtein". |
|
|
//confusion matrix// - I'm pretty sure I can implement that myself without any algorithms. |
|
|
//I think spellcheckers should be programmed to deliberately fail every so many words, or even insert barely- noticeable typos whilst typing that won't show up on the finished-product spellcheck.// |
|
|
I find that happens already, as some errors form another word. |
|
|
examples:
your/you're
lose/loose
discrete/discreet |
|
|
(The first to are quite common on the net, and widely reviled.) |
|
|
One typo I often come across is a 'dyslexic' (no offense to dyslexic people) error - hitting the (theoretically) correct key with the wrong hand (eg. putting 'k' when you needed 'd').
<Pet peeve> People getting 'than' and 'then' mixed up! Grrr!</pp> |
|
|
//I can't believe editors, who used to have to earn their pay by proofreading, can cheat...// |
|
|
Yeah!! And what about those lazy sailors who use GPS to navigate..? |
|
|
//The first to are quite common// |
|
|
Was that "to" intentional? Yeah. Must've been. |
|
|
I know someone who frequently misuses "of" - as in must of, could of, should of etc - I don't have the heart to tell them. |
|
|
You must learn to Give In To Your Hate, [zen]. |
|
|
//Was that "to" intentional? Yeah. Must've been.// |
|
| |