Computer: Web: Searching
search engine killfile   (+3, -1)  [vote for, against]
Automatically exclude search-spam sites from search results

There's a new kind of spam. I don't know what it's called, yet, but you can see it if you search for "vitamin chocolate" on google and jump ahead to, say, page 10. You'll start seeing pages and pages of nonsensical search results from sites generated only to attract search queries. [Later: This example no longer works. Yay general improvements in technology!]

They're clearly produced by the same or similar software. The pages all have similar general titles: "Foo - info on Foo" or "section relating to Foo" or "information about Foo"; and next to the quotes with the search term in them, you'll find a general cacaphony of other high-scoring search terms.

Google has a way of specifying conditions for exclusion of results (prefix them with a -), and it has the site: operator that lets me refer to resutls on a specific site. So, once I've learned that a site X is run by thieving bastards that only want to trick idiots into accessing their links, I can use -site:X to exclude it from my results.

That works, but doing it again and again is tiresome.

What I would like to see somewhere would be a way of automatically filtering all my search results to never, ever return anything from domains (and their subdomains) that I've identified as spam domains. And give users a way of sharing such spam-search-trap blacklists with each other (e.g., by being able to load them from other URLs, in a nice, public XML format.)

That way, google doesn't have to interfere with the users or invent new ways of making their algorithm detect nonsensical results, yet the users have some way of retaliating against this ridiculous attention-grabbing.

[Thanks for the "report spam" link, tsuka. I didn't know you could do that. I've reported mine. I still think there's room for the private industry here.]
-- jutta, Apr 24 2004

Google 'personalized' search (beta) http://labs.google.com/personalized/
[krelnik, Oct 04 2004, last modified Oct 05 2004]

Something funny about all those domain names... Many are two dictionary words: applelot.com, freeglen.com, furytea.com, etc. Can something be done to to filter knowing that these domains are likely machine generated?

Certainly there are legitimate websites with two dictionary words (e.g. halfbakery) but, maybe some process for determining the validity of the good sites when there domain name appears machine generated.

I just would like to try to automate the process as much as possible, to avoid someone "identifying" a spam domain, which is legitimate. I know it would happen, for the same reason my CD player shows me that "the Raod to you" is playing.

But now that I re-read your last paragraph...
-- swamilad, Apr 24 2004


You'll report them to Google? That won't do any good. You can't stay ahead of those people that way. Google just has to learn to filter out those bait-and-switch sites. One way would be a faster robot cycle time tied to a blacklist.
-- ldischler, Apr 24 2004


Perhaps, if it was as easy as reporting spam in Yahoo. But even then, spamers could retaliate by reporting real sites, making it impossible to weed out the spam without human intervention.
-- ldischler, Apr 24 2004


This might be an interesting feature for the "personalized search" that is in beta test at Google right now. It remembers alot of other things about what you do and do not want to see in searches. See link.
-- krelnik, Apr 24 2004


This is an excellent idea.

One slow, non-automated approach is for Google to add a "never see results from this domain again" checkbox next to each search result. Store in your personalized Google profile referenced by a cookie (or account #).

Maybe there'd be a nice way for users to share that subsection of their profile.
-- bristolz, Apr 24 2004


The way to make this work but prevent malicious individuals from subverting the system is by way of votes. Every vote would move the search result farther down in in the results page. I do not think a result should ever disappear completely, just be ranked very low. Of course a very dedicated malicious one could vote multiple times for [zanzibar]'s address.
-- bungston, Apr 24 2004


Buy Vitamin Chocolate at Amazon.com:
www.amazon.com/search?no_results_found

Buy Vitamin Chocolate at Barnes & Noble
www.barnesandnoble.com/query_&search=vitam....
-- phundug, Apr 26 2004


I'd place this idea into a "not only, but also" class of interest.

Means, I'd like an agent version to side within my email filtering module; it would followup with an autoupdate to my personal killfile, and cc: notifications to any and all relevant ISP loci.
-- dpsyplc, Apr 26 2004


While we're doing killfiles, I'd also like one that hides any results that were on recent searches of mine.

This would be extremely useful in searching for photos or images.
-- phundug, Apr 26 2004



random, halfbakery