Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Tastes richer, less filling.

idea: add, search, annotate, link, view, overview, recent, by name, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.



Web Versioning

  [vote for,

This is quite a technical idea with little or no comedic content. Before I explain it, I will first briefly explain the following four concepts.

A "hash" is a one-way function. For any given input, it will produce a predictable and exact output, but by looking at just the output it is very difficult to guess what the input was. One use of a hash is as a checksum: when downloading a big file, it's common to specify the MD5 hash of the file too, so that the user can check it downloaded without errors.

A "secure website" means two things, encryption and proof of identity. Encryption means that no-one can spy on the data sent back and forth. Identity -- the idea that the website you're connecting to really is who they say they are -- is done through certificates and a chain of trust. The core concept is that a certificate is shown, with cryptographic proof (similar to a hash function) that another website or company can vouch for them. That authority's certificate may point to another, and so on, until it leads to a number of root certificates. A copy of these root certificates is stored on your computer, or phone, and all of the trust you put into a secure website stems from there.

"Version Control" is software to keep track of software. The simplest type of version control is to rename a file with a number on the end, and whenever you change the file, increment the number. Modern version control such as "git" uses a chain of hash functions to link versions together unambiguously. It is common to share a commit hash, which points to an exact snapshot of the data at a given point in time.

An "ETag" is a tool used for caching web pages. It is a header field that gives either a version number or a hash checksum of a webpage. When you visit the site again, if the ETag hasn't changed, there's no need to re-download the page and a cached version is shown. This is generally a good thing, except that there's no standard for how an ETag is generated, and malicious websites can abuse it by sending an identifier instead, and use it to track you (and annoyingly, most browsers provide no control at all over ETags).

So, to the idea:

Apply version control to web pages, predictably generate caching checksums and publish them through the certificate process.

The result would be that going to a secure website would, in addition to being encrypted and proving identity, provide a fixed version number for the site. Predictable checksums mean that unlike the ambiguous ETag, the checksum calculation could be done client side (in the browser), much like the old way of verifying the MD5 hash of a download. Not only does give you an added assurance that nothing's been corrupted, but it means that you don't necessarily have to tell the server which version you currently have cached (which is the privacy concern with ETags).

Certificates are intentionally short-lived and are usually renewed on a regular basis, sometimes at intervals as short as two months. Websites are often updated only slightly more frequently than this, so linking the version hashes into the chain of trust wouldn't be too drastic. A secure website would be expected to keep its version number for a while, then update to the newer version only when a new certificate is issued.

One problem is dynamic content (such as a web page being different when the user has logged in) but this mixture of code and data has always been messy. Many modern sites have a clear division between code and data anyway. Within this system, there would be an enforced difference between versioned code (such as javascript files) and data (for example JSON objects). Parts of the site that have to be dynamic can be left out, and a security wall could separate it in the same way that cookies can be flagged as https-only.

Overall, security may not be improved all that much, since during a man-in-the-middle attack the browser has warned the user of the certificate error, and usually the user has clicked "ignore this, show me anyway". If an attacker can spoof a certificate, then they can spoof a version number too. I thought about having an enforced "valid until..." marker on the version number, but in the event of a vulnerability in the code being discovered, we want to update the code as soon as possible. Perhaps regular updates should be the expected behaviour, with emergency updates an option. In that case, if your internet banking website has an unexpected update, you know that either an attack is taking place, or that a critical vulnerability has just been patched. In either case, you may want to delay your banking until the situation is clears up.

mitxela, Feb 21 2019


       I'm not sure what is being fixed here, sorry. If someone can defeat the certificate they can still defeat this -- if they can't defeat the certificate, they don't need this -- am I missing something?
theircompetitor, Feb 21 2019

       Well, aside from fixing the problems with ETags, it forces any changes to a site to be formally published. A very real concern is that of a rogue employee.   

       For instance, consider a password manager service. They claim that because all of the data is strongly encrypted client-side (in javascript) that if the server were ever compromised and someone stole the database, there would be no danger. However, if the server were compromised, it would be trivial to change the javascript code to also save plaintext passwords. For this reason some people have an inherent distrust in javascript cryptography algorithms, but formally versioning the code would restore some of that trust.
mitxela, Feb 21 2019


back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle