Computer: Web: Searching
Vocabuladex   (0)  [vote for, against]
Your domain reads on a 12th grade level

Web crawlers like Google create indices that contain all the words present on each crawled site. It would be interesting to correlate these word lists with the dictionary, to measure the size of each site’s “vocabulary.” (The dictionary is necessary to eliminate invented terms, non-words and proper names). Then boil these numbers down into a ranking.

You could use these rankings for a number of things, some perhaps even useful.

Educators could judge the appropriateness of a reference web site for a given class level. (Don’t send your fourth graders off to a site that uses many college-level words they haven’t learned yet).

Web searchers looking for highly in-depth web sites on a given subject might be able to eliminate those that only scratch the surface.

Elitist snobs could use it as yet another reason to look down their noses at others.

ISPs could brag that their users' home pages are more intelligent than their competitors.

I have a feeling that this web site would have a quite high ranking.
-- krelnik, Apr 26 2003

crumbs, for starters. http://www.halfbakery.com/idea/crumbs!
[po, Oct 05 2004]

Macromedia Flash Search Engine SDK http://www.macromed...load/search_engine/
Don't know if Google or any of the other majors are going to start using it, though. [krelnik, Oct 05 2004]

Google's file types http://www.google.c...p/features.html#pdf
"Google has expanded the number of non-HTML file types searched to 12 file formats. In addition to PDF documents, Google now searches Microsoft Office, PostScript, Corel WordPerfect, Lotus 1-2-3, and others." [krelnik, Oct 05 2004]

I like the idea and think it an interesting metric.

[MrKlaatu]: So, by the expression "today's world," are you indicating that people are less educated today than at some earlier time? If so, do you mean that there are fewer people with education or that the people with education are less educated?
-- bristolz, Apr 26 2003


They's getting dumberer all the time, ain't they?
-- thumbwax, Apr 26 2003


we could also implement this for individual 'bakers...just a thought
-- igirl, Apr 26 2003


Google indexes PDFs and other non-HTML content now, Rods. Macromedia is offering a developer kit that lets you easily extract the text from Flash movies too, and they are trying to get the major search engines to use it. So I don't see that as a problem. (See links).

Movies and songs are a tough problem, indeed.
-- krelnik, Apr 26 2003


this is a pretty good idea. id love to see this.
-- ironfroggy, Apr 27 2003


In counterpoint, I think that our imposed or implied requirement to lessen the reading level is partly to blame for the current situation. Just as people pick up accents amazingly quickly, struggling through a few publishings much over one's head lifts your comprehension level in a hurry. A modern shortage of patience is the only problem, or very soon we shall find ourselves abandoning the 12th grade altogether.
-- RayfordSteele, Apr 27 2003


//...my spelling of recognise...//
I think that's just a matter of using an appropriately large dictionary, like the OED.
-- krelnik, Apr 27 2003


you could also incorporate details uploaded from people's custom.dic tionary files.
-- neilp, May 28 2003


Google has apparently now added Flash support to the index, even though their own documentation does not yet admit this. (Thanks to [waugsqueke] for noticing this).
-- krelnik, Apr 29 2004



random, halfbakery