Low-pass filter for news

Filter out the stories which don't say anything new
  [vote for,

Take an on-line news service like CNN or the New York Times. Scan through each day's stories for all non-stop-words, and for each keyword compute the fraction of stories in which it has occured over the last T days (T being a user-set parameter). Then create a table of contents of the day's stories, listing beside each title their Shannon information content, approximated as though each keyword was an independent random variable. Stories whose keywords have been very common for the last T days presumably tell you little that's new, so they get a low score. List stories in order of decreasing score, if you like.

Keep separate frequency-databases for different news-sources you read, or amalgamate them if they cover similar topics.

This procedure will over-estimate the information content. A more subtle algorithm would induce which words cluster together, e.g. {"Watergate", "Nixon", "Haldeman", "plumbers", "18 and 1/2"} or {Lewinsky, Starr, stain}, and treat those clusters as its independent variables. The statistical language processing wallahs already have such inductive algorithms.

cosma, Jan 28 2000

Term weighting widely used for text retrieval http://www.ftp.cl.c...text-retrieval.html
This is called "term-frequency weighting" and is in common use by the search engines. Much to my disgust, I found that information-theoretic weights don't work nearly as well as some ad-hoc methods. See the paper "Simple, proven approaches to text retrieval." [rmutt, Jan 28 2000, last modified Oct 04 2004]

Etxtreme http://www.etxtreme.ru
This seems to work with this general principle, but as applied to a huge group of very diverse URLs instead of a TV news program. [monde, Jan 28 2000, last modified Oct 04 2004]

Teefax..ohhh those blocky, limited palette colours... https://twitter.com...tag/Teefax?src=hash
[not_morrison_rm, Sep 25 2016]


       Isn't this really a high-pass filter?
egnor, Mar 02 2000

       High and low in "X pass filter" refers to the frequencies that are allowed through the filter.   

       What I think egnor correctly observes is that the filter cosma describes prefers drastic, fast change (as in the amplitude of a high-frequency signal) over slow change (as in the amplitude of a low-frequency signal.)
jutta, Oct 25 2000

       The "Etxtreme" link is broken for me. Did the site go away, or is it temporarily down, or is the URL incorrect?
egnor, Oct 26 2000

       These days this would result in a blank tv screen.
RayfordSteele, Aug 14 2007

       What's needed is a news site with a user-configurable view.   

       With a regular news site, links on the front page to business, politics, entertainment, arts, science, sport ... below that, more choices and so on.   

       A configuration page could allow a user to select-   

       + Science
:+ Geology
:: + Earthquakes
:: + Volcanos
:: - Floods
:: - Avalanches


+ Sport
: + Ball Sports
:: +Team Sports
::: + Basketball
::: - Football
::: + Hockey
::: - Rugby
:: + Individual
::: + Golf
::: - Tennis

The page(s) would then only display items of interest.

       Oddly, this doesn't seem to exist anywhere.
8th of 7, Sep 24 2016

       //Oddly, this doesn't seem to exist anywhere.// But it used to - Ceefax, Oracle, Prestel.
MaxwellBuchanan, Sep 25 2016

       Hmm, somebody is tinkering with a new Ceefax service called Teefax, not having a raspberry pi, or a tv and being in the wrong country, I'll have to take their word for it. See link
not_morrison_rm, Sep 25 2016

       The answer would seem to be an application that is gven a list of categories and a list of URLs and then searches and collates the results, then displays the results as a web page.
8th of 7, Sep 25 2016

