 h a l f b a k e r y I think, therefore I am thinking.
idea:
add, search, annotate, link, view, overview, recent, by name, best, random
meta:
news, help, about, links, report a problem
account:
Browse anonymously,
or get an account
and write.
or Create a new account.
|
|
|
Take an on-line news service like CNN or
the New York Times. Scan through each
day's stories for all non-stop-words, and
for each keyword compute the fraction of
stories in which it has occured over the last
T days (T being a user-set parameter). Then
create a table of contents of the day's
stories,
listing beside each title their Shannon
information content, approximated as though
each keyword was an independent random
variable. Stories whose keywords have
been very common for the last T days
presumably tell you little that's new, so they
get a low score. List stories in order of
decreasing score, if you like.
Keep separate frequency-databases for
different news-sources you read, or
amalgamate them if they cover similar
topics.
This procedure will over-estimate the
information content. A more subtle algorithm
would induce which words cluster together,
e.g. {"Watergate", "Nixon", "Haldeman",
"plumbers", "18 and 1/2"} or {Lewinsky, Starr,
stain}, and treat those clusters as its
independent variables. The statistical
language processing wallahs already have
such inductive algorithms. Term weighting widely used for text retrieval
http://www.ftp.cl.c...text-retrieval.html This is called "term-frequency weighting" and is in common use by the search engines. Much to my disgust, I found that information-theoretic weights don't work nearly as well as some ad-hoc methods. See the paper "Simple, proven approaches to text retrieval." [rmutt, Jan 28 2000, last modified Oct 04 2004]
Etxtreme
http://www.etxtreme.ru This seems to work with this general principle, but as applied to a huge group of very diverse URLs instead of a TV news program. [monde, Jan 28 2000, last modified Oct 04 2004]
[link]
|
| |
Isn't this really a high-pass filter? |
|
| |
High and low in "X pass filter" refers to the frequencies that are allowed through the filter. |
|
| |
What I think egnor correctly observes is that the filter cosma describes prefers drastic, fast change (as in the amplitude of a high-frequency signal) over slow change (as in the amplitude of a low-frequency signal.) |
|
| |
The "Etxtreme" link is broken for me. Did the site go away, or is it temporarily down, or is the URL incorrect? |
|
| |
These days this would result in a blank tv screen. |
|
| |