Halfbakery: Cached Links

Please log in.

Before you can vote, you need to register. Please log in or create an account.

Halfbakery: Links
Cached Links (+1, -2) [vote for, against]

If you look up an idea on the Halfbakery which isn't very recent, you'll probably find that some of the links don't work - the page has moved, or the server doesn't exist anymore - whatever the case, you can't access whatever was linked.

To remedy this, I suggest that each link have a little "(cache)" link next to them, before the linker's name. This brings you to a barebones page asking whether you want to view the Google cache, Archive.org cache, or a Coralized version of the page for slow-loading links.

Google, as many of you know, keeps a simple text-only cache of a webpage. While the cache isn't the most complete, Google's servers are fast and reliable.

Archive.org offers the "Internet Wayback Machine," a system which lets you view how a page has changed over time. The archive is extensive and dates back to 1996, it seems.

Coral, also known as the New York University Distribution Network, is a free service with which any site can be given virtually infinite bandwidth. Just append "nyud.net:8090" to the end of a domain, and the page is cached - anyone else who visits the Coralized page within a short amount of time sees the cached version, hence saving the original server's bandwidth. This is a great solution for slow-loading pages.
-- rgovostes, Oct 06 2004

(?) Coral http://www.scs.cs.nyu.edu/coral/
New York University Distribution Network [rgovostes, Oct 06 2004]

Coralized Idea http://www.halfbake...idea/Cached_20Links
This idea, mirrored by Coral. [rgovostes, Oct 06 2004]

Origins of The Wayback Machine http://en.wikipedia..._growth_and_storage
It started with Alexa Internet's automated crawlers... [Spacecoyote, Nov 24 2014]

It should be noted that the Halfbakery's robots.txt file forbids Archive.org from crawling the site and archiving ideas. Or not:
-- rgovostes, Oct 06 2004

That was true for a while, but is currently incorrect. I don't think it was correct at the time you made that annotation.

This idea would establish a complicated mechanism (and interdependencies between the halfbakery and several other enterprises, some of them commercial) to do something that someone intelligent enough to click on that "cached" link can probably do themselves, if they're sufficiently curious.

"Coral" is a cool idea, but has nothing to do with the problem you're trying to solve, right? I mean, our links tend to die of old age, not of overload.

[By the way, I just tried to follow your link to the "Coralized Idea":

Error 408 Request Time-out

Server CoralWebPrx/0.1 (See http://www.scs.cs.nyu.edu/coral/) at 128.148.34.3:8090]
-- jutta, Oct 06 2004

Thank you for the correction - I thought I had tried looking an idea up on Archive.org before and it told me about the robots.txt file, but I did not confirm this before annotating.

Coral is very handy for sites which are slow-loading or have been temporarily knocked offline. It has been frequently used on Slashdot.org to neutralize the "Slashdot effect," i.e. providing a mirror of a page which isn't available due to the heavy traffic.

Strange that the link does not work for you. It loads nearly instantly for me.
-- rgovostes, Oct 06 2004

It repeatedly fails for me. We're probably geographically distant and are being redirected to two different proxies - yours works, mine doesn't. Sucks that the demo doesn't work for me (the "cnn.com" demo they have on their webpage works, though), but isn't it cool how you can tell from the errors how it's structured...?
-- jutta, Oct 06 2004

I have a user-configured verb in my copy of IE named "cache". If a URL fails to load, I just go up into the address bar, type cache to the left of the URL and a space and hit return. It loads it from the Google cache.

This is pretty easy to set up in IE, its just a registry entry. Put the following in a file CACHE.REG and then double click it:

REGEDIT4

[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\SearchUrl\cache]
@="http://www.google.com/search?q=cache:%s"
" "="+"
"#"="%23"
"&"="%26"
"?"="%3F"
"+"="%2B"
"="="%3D"

If you have the Google toolbar loaded, you can do the same thing from within the toolbar box, just put a colon ":" between "cache" and the URL instead of a space.
-- krelnik, Oct 06 2004

Yeah, the Google Toolbar also has an "info" button (it's an i in a blue circle) which among other things offers a cached version of the page. I've also seen plugins for Mozilla which add an Archive.org lookup button to the toolbar. You can find also find simple Bookmarklets which Coralize the page you're currently looking at.

Still, most users don't have things like these installed.
-- rgovostes, Oct 06 2004

[krelnik] that rocks!
-- neilp, Oct 06 2004

// This idea would establish a complicated mechanism (and interdependencies between the halfbakery and several other enterprises, some of them commercial) to do something that someone intelligent enough to click on that "cached" link can probably do themselves, if they're sufficiently curious. //

I know that you can request archive.org to cache a page. Does it automatically browse the internet and cache pages that no one requests? If not, it seems like it would be useful for the the halfbakery to automatically send a cache request whenever someone links a page (or possibly wait a week so ideas or links that get deleted or flagged right away don't get cached). My first thought was that this would be taking advantage of archive.org, but really they want to cache anything that anyone finds to be interesting or useful. That is true of most of the links people post here.

Of course you might want to ask if they would appreciate that before you actually implemented it.
-- scad mientist, Nov 24 2014

//Does it automatically browse the internet and cache pages that no one requests?//

Yes, it does [link].
-- Spacecoyote, Nov 24 2014

random, halfbakery