Half a croissant, on a plate, with a sign in front of it saying '50c'

h a l f b a k e r y
I think, therefore I am thinking.

idea: add, search, annotate, link, view, overview, recent, by name, best, random

meta: news, help, about, links, report a problem

account: Browse anonymously, or get an account and write.

User:
Pass:

or Create a new account.


               

Low-pass filter for news
Filter out the stories which don't say anything new
  (+3)
(+3)
  [vote for,
against]


Take an on-line news service like CNN or the New York Times. Scan through each day's stories for all non-stop-words, and for each keyword compute the fraction of stories in which it has occured over the last T days (T being a user-set parameter). Then create a table of contents of the day's stories, listing beside each title their Shannon information content, approximated as though each keyword was an independent random variable. Stories whose keywords have been very common for the last T days presumably tell you little that's new, so they get a low score. List stories in order of decreasing score, if you like.

Keep separate frequency-databases for different news-sources you read, or amalgamate them if they cover similar topics.

This procedure will over-estimate the information content. A more subtle algorithm would induce which words cluster together, e.g. {"Watergate", "Nixon", "Haldeman", "plumbers", "18 and 1/2"} or {Lewinsky, Starr, stain}, and treat those clusters as its independent variables. The statistical language processing wallahs already have such inductive algorithms.


cosma, Jan 28 2000

Term weighting widely used for text retrieval http://www.ftp.cl.c...text-retrieval.html
This is called "term-frequency weighting" and is in common use by the search engines. Much to my disgust, I found that information-theoretic weights don't work nearly as well as some ad-hoc methods. See the paper "Simple, proven approaches to text retrieval." [rmutt, Jan 28 2000, last modified Oct 04 2004]

Etxtreme http://www.etxtreme.ru
This seems to work with this general principle, but as applied to a huge group of very diverse URLs instead of a TV news program. [monde, Jan 28 2000, last modified Oct 04 2004]

[link]






       Isn't this really a high-pass filter?

egnor, Mar 02 2000
  

       High and low in "X pass filter" refers to the frequencies that are allowed through the filter.   

       What I think egnor correctly observes is that the filter cosma describes prefers drastic, fast change (as in the amplitude of a high-frequency signal) over slow change (as in the amplitude of a low-frequency signal.)

jutta, Oct 25 2000
  

       The "Etxtreme" link is broken for me. Did the site go away, or is it temporarily down, or is the URL incorrect?

egnor, Oct 26 2000
  

       These days this would result in a blank tv screen.

RayfordSteele, Aug 14 2007
  
      
[annotate]
  


 
back: main index
 business 
 computer 
 culture 
 fashion 
 food 
 halfbakery 
 home 
 other 
 product 
 public 
 science 
 sport 
 vehicle