Half a croissant, on a plate, with a sign in front of it saying '50c'

h a l f b a k e r y
Tip your server.

idea: add, search, annotate, link, view, overview, recent, by name, best, random

meta: news, help, about, links, report a problem

account: Browse anonymously, or get an account and write.

User:
Pass:
Login
Create account.


                                                             

One, But You're Not The Same
Search Results Should Present Different Information
  (+6, -2)
(+6, -2)
  [vote for,
against]


When searching a term that is present in a hot press release, one finds that hundreds, or even thousands of hits, are simply referring to the exact same text.

Since the search engine is indexing the text, it might actually notice, and prune these out, making the haystack somewhat smaller.


theircompetitor, Mar 07 2005

Google cheat sheet http://www.google.c...elp/cheatsheet.html
[waugsqueke, Mar 09 2005]

Googlr search: turvy -"topsy turvy" http://www.google.c...+-%22topsy+turvy%22
"turvy" other than "topsy turvy" [waugsqueke, Mar 09 2005]

[link]






       Is that a distinct problem from link farms, which I believe Google does its best to filter already?

DrCurry, Mar 07 2005
  

       In the case that triggered this idea, I was looking for more data on a specific company that was mentioned in a press release.   

       Instead, I'm getting thousands of hits all quoting the press release. Mind you, the summary paragraph is actually fairly obviously the same. So Google kind of knows it's showing the same data

theircompetitor, Mar 07 2005
  

       Google sort of does this now...   

       "In order to show you the most relevant results, we have omitted some entries very similar to the ## already displayed. If you like, you can repeat the search with the _omitted results included_."

waugsqueke, Mar 07 2005
  

       This would be easily solved if Google allowed a NOT operand.

Worldgineer, Mar 07 2005
  

       [waugs]: I think that only excludes multiple hits from the same domain. (I could be wrong, per usual.)

angel, Mar 07 2005
  

       I thought this idea would require that we carry each other. Carry each other.

bungston, Mar 07 2005
  

       Damn you [bungston]! you beat me to it.

Freefall, Mar 07 2005
  

       guys, just clarifying -- these are hits from multiple different sites that all refer to the same exact text.   

       So the only clue that Google has to the "sameness" of the text is the abstract.   

       And it's absolutely generating thousands of them.   

       Now, you can "minus" certain terms and eliminate all of the hits -- which is not what you would want either.   

       Ideally, you'd want to see unique information referred to in a unique way, and no more than necessary.   

       [confidential to Freefal -- you should have said U2 bungston].

theircompetitor, Mar 07 2005
  

       Wouldn't it be simpler to use a unique searchword, to exclude the massive hit return? Something that you know about the company, that is unlikely to be part of the press release?   

       I use that method quite a lot. Just pick some obscure factoid and add it to the search criteria.

UnaBubba, Mar 07 2005
  

       Bono idee' = +   

       And I can't be holdin' on...

csea, Mar 07 2005
  

       //if Google allowed a NOT operand//
It does already. Just put "-" before an item in the query.
  

       This searches for pages with "foo" but not "bar":
foo -bar
  

       This searches for pages with "foo" but not on the halfbakery:
foo -site:halfbakery.com

krelnik, Mar 08 2005
  

       sure, though it's tricky to do that for a whole paragraph or article.   

       I think the criticism, though valid, misses the point.   

       Sure I can be smart enough to still find what I want.   

       But why would you show me 1000s of copies of the same entry? My assistant wouldn't, right?

theircompetitor, Mar 08 2005
  

       I guess not. Has anyone volunteered to be your assistant, apart from the MS Paperclip (The annoying little shit).

UnaBubba, Mar 08 2005
  

       [UB], no sadly. I'm sure it's not my personality, though

theircompetitor, Mar 08 2005
  

       Not all of them, anyway.

Detly, Mar 08 2005
  

       // it's tricky to do that for a whole paragraph or article. //   

       Grab a fairly unique phrase from it, put it in quotes and then put a - before it. That tells Google to ignore anything that includes this passage of text.   

       Added link to Google's cheat sheet, showing all the operators. They can be combined in very useful ways.

waugsqueke, Mar 08 2005
  

       <aside>The Brits among us may care to check the first Gooogle hit for "fuckwit".</aside>

angel, Mar 09 2005
  

       'Miserable failure' is interesting, too.

waugsqueke, Mar 09 2005
  

       I'd like an "other-than" boolean operator which would exclude text matches that satisfied a particular criterion, but not exclude entire pages on that basis.   

       For example,   

       "turvy" other than "topsy turvy"   

       would find places where the word "turvy" appeared not preceded by the word "topsy". Sites containing the phrase "topsy turvy" would not be completely excluded, but would only be included if the word "turvy" appeared without the word "topsy" in front of it.

supercat, Mar 09 2005
  

       [angel] I'm a Yank, myself, but I checked the google as you suggested. I wonder who set it up to go to [John Leslie Prescott]

normzone, Mar 09 2005
  

       //Grab a fairly unique phrase from it, put it in quotes and then put a - before it. //   

       waugs--it seems like that would get rid of every instance it occurs, whereas the intention of this idea (I think) is to show it once and only once.   

       yabba dabba yes!

theircompetitor, Mar 09 2005
  

       // "turvy" other than "topsy turvy" //   

       supercat, Google will do that too (see link). Note it found a surprisingly large number of instances of 'autopsy-turvy'.   

       // the intention of this idea (I think) is to show it once and only once. //   

       Yes but on the initial search, where you've found thousands of the same thing (which I still think Google will reduce down, even over multiple domains), you know on subsequent searches what to exclude.

waugsqueke, Mar 09 2005
  

       //I wonder who set it up to go to [John Leslie Prescott]//
That was a joint effort by several bloggers, led by a guy called Tim Worstall. Google his name for details.

angel, Mar 10 2005
  

       I don't think Google's doing what I want, since it would not return a page containing the phrase "flopsy-turvy topsy-turvy"; the phrase "topsy-turvy" should not disqualify the page altogether, but when using the "minus" operator on Google it does.

supercat, Mar 10 2005
  

       Not a bad idea on the surface, but begins to look less attractive when you consider some of the questions that would have to be answered in implementation. In particular, how should Google (or any other search engine) select a "definitive source" for a given document?   

       Perhaps it would be better if we could attach our own intelligent agents to the search services, to sit between us and the raw flow of information and filter out the useful bits according to our own individual criteria. Otherwise we put search engines in the business of pre-filtering our information for us, and I don't know that we really want that.

uhlume, Mar 10 2005
  

       // it would not return a page containing the phrase "flopsy-turvy topsy-turvy"; the phrase "topsy-turvy" should not disqualify the page altogether, //   

       Hm. I'm confused reading that, so I'm sure Google would be too. You're saying you don't want pages that have the phrase "topsy turvy" on them to appear in the search results, then say that the phrase "topsy turvy" shouldn't prevent a page from appearing when it's filtered out. Umm.

waugsqueke, Mar 10 2005
  
      
[annotate]
  


 
back: main index
 business 
 computer 
 culture 
 fashion 
 food 
 halfbakery 
 home 
 other 
 product 
 public 
 science 
 sport 
 vehicle