Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
I think, therefore I am thinking.

idea: add, search, annotate, link, view, overview, recent, by name, best, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


             

Low-pass filter for news

Filter out the stories which don't say anything new
  (+3)
(+3)
  [vote for,
against]

Take an on-line news service like CNN or the New York Times. Scan through each day's stories for all non-stop-words, and for each keyword compute the fraction of stories in which it has occured over the last T days (T being a user-set parameter). Then create a table of contents of the day's stories, listing beside each title their Shannon information content, approximated as though each keyword was an independent random variable. Stories whose keywords have been very common for the last T days presumably tell you little that's new, so they get a low score. List stories in order of decreasing score, if you like.

Keep separate frequency-databases for different news-sources you read, or amalgamate them if they cover similar topics.

This procedure will over-estimate the information content. A more subtle algorithm would induce which words cluster together, e.g. {"Watergate", "Nixon", "Haldeman", "plumbers", "18 and 1/2"} or {Lewinsky, Starr, stain}, and treat those clusters as its independent variables. The statistical language processing wallahs already have such inductive algorithms.

cosma, Jan 28 2000

Term weighting widely used for text retrieval http://www.ftp.cl.c...text-retrieval.html
This is called "term-frequency weighting" and is in common use by the search engines. Much to my disgust, I found that information-theoretic weights don't work nearly as well as some ad-hoc methods. See the paper "Simple, proven approaches to text retrieval." [rmutt, Jan 28 2000, last modified Oct 04 2004]

Etxtreme http://www.etxtreme.ru
This seems to work with this general principle, but as applied to a huge group of very diverse URLs instead of a TV news program. [monde, Jan 28 2000, last modified Oct 04 2004]

[link]






       Isn't this really a high-pass filter?
egnor, Mar 02 2000
  

       High and low in "X pass filter" refers to the frequencies that are allowed through the filter.   

       What I think egnor correctly observes is that the filter cosma describes prefers drastic, fast change (as in the amplitude of a high-frequency signal) over slow change (as in the amplitude of a low-frequency signal.)
jutta, Oct 25 2000
  

       The "Etxtreme" link is broken for me. Did the site go away, or is it temporarily down, or is the URL incorrect?
egnor, Oct 26 2000
  

       These days this would result in a blank tv screen.
RayfordSteele, Aug 14 2007
  
      
[annotate]
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle