Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Good ideas at the time.

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


               

Link Rover

Don't just find broken links, fix them
  (+2)
(+2)
  [vote for,
against]

(Apologies if this already exists, I looked and looked and couldn’t find one that did this).

Anyone who has had to maintain a web site knows that broken links are a continual pain in the neck. There are a huge number of tools to deal with this problem, but what they do is simply “spider” your web site, retrieve each link, and determine if a 404 (Page Not Found) error has occurred. Then they notify you in some way so you can fix it.

I think this could be taken further. Instead of waiting for maintenance time, spider the links while the site is in good working condition, to grab the intended target page of each link. Scrape out all the redundant HTML, menus, scripts and such to get to the actual content of the page. Then store this content (or snips of it) locally on the webmaster's system for use later. Include a manual interface so the webmaster can tweak what pieces of text from the target page are considered significant.

Then at maintenance time the software is armed with not only the URLs of the links, but the content which each is intended to display. Again spider the web site, and verify not only that each page leads somewhere, but that the intended content is present at the target page.

But wait, there’s more. If a link breaks, the software has what it needs to find a replacement. It can use a search engine to search the web for the target text from the original link. (Google has an API for this sort of thing). This would allow it to find a moved page on the original site, as well as mirrors of the original material. It could also automatically search the Internet Wayback Machine for a cached archive of the page, as well as the Google cache.

When it notifies you of broken links, it can supply suggested replacements, which you can verify and easily select.

Aside from making link maintenance a breeze, this tool would also allow you to catch broken links that many current tools would never notice such as:
· Content changed URL within site
· Domain has been sold or repurposed
· Domain has been “parked” to a search page

krelnik, Oct 12 2003

Area 404 http://www.plinko.net/404/area404.asp
But what if you had purposely linked to one of these pages? [hippo, Oct 04 2004, last modified Oct 05 2004]

BBC News: Web tool may banish broken links http://news.bbc.co....hnology/3666660.stm
Oct 05 2004: They call it "Peridot" [krelnik, Oct 05 2004]

[link]






       yep, proactive link fixing, like it (+)
neilp, Oct 12 2003
  

       Recalls [Rayford]'s WebReaper experience.
bristolz, Jun 09 2004
  

       Yes, if you build this and run it on the bakery, make sure it is not logged in under a user account at the time.
krelnik, Jun 09 2004
  

       I found one on Google, but the link was broken...
david_scothern, Jun 09 2004
  

       Apparently I wasn't so crazy after all, some student interns at IBM UK baked almost exactly what I described here. See link.
krelnik, Oct 05 2004
  
      
[annotate]
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle