Halfbakery: Vocabulary Highlighter

The idea is to make a browser automatically highlight the likely-unknown or not-very-well-understood words of its text to the user, so the user could easily identify the words s/he doesn't know to make reading more understandable.

How it would work?

There are many ways to make a computer highlight rarer words. There is a large set of words that are rarely used in English. However, it wouldn't be good to highlight all of these words, because a person could be familiar with them anyway.

I think it would be better to create something like word-highlight-service. Every time a person reads a text and wants the highlighting service on, s/he can turn it on. After the service is turned on, it highlights the words in different colors denoting the probability of the event that the word is unknown to the user.

The probability could be calculated according the information about the account, the law of forgetting (Ebbighausen function of forgetting information), empirically estimated user's forgetting patterns (as every person has a slightly different faculty of memory), the general frequency of each word in English, person's age and other maybe possible variables like education, interests, spoken languages. However, excluding all other variables, the most important would be only two:

1) How frequently and when the word is seen(read) from the time it has been marked as defined. (In the Vocabulary Highlighter's database, it should register the actual moments each word was seen/read)
2) How frequently and when the word is marked as defined/understood.

And the most important part of the Vocabulary Highlighter program would be applying the patterns of forgetting of the user to predict the likely-unknown or possibly-forgotten words.

The data about the user's vocabulary should be continuously accumulated and stored both in user's computer (which makes the highlighting fast) and in a remote server, which is time to time synchronized (so information is not lost in time and while the user changes computers..).

How would the browser know if you are reading that word or not? Simply if you are reading a paragraph and mark a word "as defined" (this means you have just looked it up in a dictionary), then the program assumes that all the other words of the same paragraph before that word have been read and seen. If the user in the same paragraph marks another word "as defined", the program assumes that the words between the "defined" words are seen. However, not always the whole paragraph is being read, so I think the program should ask if you have read the previous words in the paragraph.

I think the idea described so far would work as expected if there were no homonyms (that is if every word had only one meaning per spelling) in human languages. If one would want to eliminate them, one would have to use a new kind of vocabulary made of words with only one word per concept, for all texts, and by all people. It is not very doable. However it's good to know that many of the rare words does have only one meaning, so I think the Vocabulary Highlighter described so far would still be useful to great extent.

Background:

I thought of this idea because recently I have been reading some medical textual information on the web. I'm not a doctor, so I had some problems. I had to get defined many of the new words and drug names in order to understand the texts. OneLook.com and its dictionaries were very helpful indeed. However, after reading the text I realized having had missed out several already-heard, but not completely well-understood words (as I was hurrying to understand the entire text, not its words). Also, I realized having not completely understood the mechanisms I was reading about, so I had to reread it and get the words defined, what was time consuming. So I thought, couldn't the computer predict me the new words I don't know?