Please log in.
Before you can vote, you need to register. Please log in or create an account.
Computer: Web: Linking
Hash n' Link   (-1)  [vote for, against]
Include an md5 hash along with hrefs

Web page caching is a tricky beast. It is vital for performance, but can cause unexpected results. I propose we add an optional extra field in links, as so: <img href="foo.jpg" md5sum="53b1ef86714ff79bceb5e9290e827574" />

now, we can compare that sum with the sum of the file in our cache and decide whether we need to reload the file based on that. We can now cache images and pages indefinatly without fear of seeing out-of-date information.

It would be no more difficult to maintain than height and width tags, which also depend on the linkee; but with things like generated content it obviously wont work. (which is what we want since generated pages generally arn't cached anyway). It also won't work for mutually recursive web pages.

Another nice property is that since hashes are probablistically globally unique, items in the cache can be shared between sites! Just index the cache on md5sum as well as location. Only download that 'under construction' gif once and for all! since the hash cannot be faked, people cannot 'pollute' your cache with bad files.

Best of all, all this needs is browser support and is completly backwards compatable with current systems. Web page authors can start implementing it now and clients which can make use of the data will.
-- johnmeacham, Apr 23 2003

Seems to me it would be terribly hard to keep content up to date. I'd prefer a complete overhaul of the system - when anything is requested from a server, the first thing it sends is a hash code. The client browser compares it to the hash of the content it already has. That way the web surfer isn't depending on someone remembering to change all of their html every time they change a picture.
-- Worldgineer, Apr 23 2003


We'd all prefer a complete overhaul, but it ain't gonna happen. And anyway, JPEGs don't change very often.

Thing is, this only really saves the cost of an If-Modified-Since or a HEAD, so it's not a massive savings in bandwidth or anything. Maybe a marginal improvement in latency for the user, plus the cross-site effects, which are great but require critical mass.
-- egnor, Apr 23 2003


Well, that is why it is optional. presumably people wouldn't use it if they had no mechanism in place to keep the hashs up to date. I believe you can get the hash in the http headers as well, which could be used to compare against your cached version, but doesn't help with other problems and still requires you to connect to the server.

I for one run all my web pages through a script which keeps width and height tags up to date, it could easily add the sum as well. Many people use other tools which could be modifed to support this.

Not having to connect to a server is a big bonus under high latency connections. a big big bonus. DirecWay has a 2 second (!!) ping time. establishing TCP connections is a nightmare.
-- johnmeacham, Apr 23 2003


I can see your point, but I'd change your idea to a lastModified tag, which could be entered by hand. No hashing required, and serves the same porpoise.
-- Worldgineer, Apr 23 2003


Just a last modified time would not allow sharing images and pages between sites. one of the major advantages of this scheme. It would also not detect and replace corrupted files in your cache. A hash also just feels more correct for this application. Last-Modified times are more 'brittle'. md5sums are also easy to do by hand. 'md5sum' is an application installed by default by many operating systems.
-- johnmeacham, Apr 23 2003



random, halfbakery