h a l f b a k e r y
"Bun is such a sad word, is it not?" -- Watt, "Waiting for Godot"
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
or get an account
I don't know anything about networking, so let me know if I'm
way off base with this...but wouldn't it be advantageous when
sending data over a network (the Internet for example) to
have a local library of the most common data packets?
For instance, you need to send http://www so instead of
sending the 10 byte string you could just send a 4 byte
address that is looked up at the other end. With a 4 byte
address you could have access to over 4 billion common
packets. If the data is not in the library then just send it as
Of course both client and host would need a copy of the same
library, maybe it could be stored on a usb flash drive.
Software that compresses network traffic to increase speed [mitxela, Apr 13 2010]
Layers of a network
you need to go up a few to read the information inside. [Voice, Apr 13 2010]
Content Distribution Network
The idea behind these is to try and ensure that you don't have to retrieve your data from all the way across the network. This means pushing copies of the data to the edges of the network (either servers or peers). [Jinbish, Apr 13 2010]
Shoulders of giants
[mouseposture, Apr 16 2010]
||The problem with this is it's expensive (hardware
intensive) to actually read packets in the router.
you have to go up five layers to get to where the
http is and then you have to account for fifteen
different standards when you find out what's in
What the router sees:
||01100110 11100101 00010010 01110100 00001110
00000000 00000000 00100111
||That could be part of http. It could also be part of
https. Or ftp. or ssec. Compressing it at lower
levels is being done to the extent possible and
compressing it at higher levels is impractical.
||As [Voice] alludes, you don't want to do this at the lower layers (like IP packets, Ethernet frames, etc.).
||On saying that - you have the right idea about trying to store traffic locally so that it doesn't need to be sent. When done at the higher layers (like webpage/http) this is often called "Content Distribution". Companies like 'Akamai' will host copies of others' websites in local* positions and keep them updated - basically like a download mirror.
||While this is normally done at the edges of networks (to be closer to the requesting parties) it is also done centrally to help balance load at the serverside (think Google, and it's many, many servers that all give you the same front page).
||(*Think network topology rather than geographical)
||Oh - and peer-to-peer networks can do some of this too, in a very clever way. P2P works by chopping up information and then spreading it about a network.
||Imagine if, instead of delivering a newspaper to all of your neighbours, the local shop only gave each of you 1 page. To get the rest of the paper you have to travel round your neighbours (much like they also have to do) but at least you don't have to go all the way to the shop. Unlike a CDN, which would be the same as having 1 neighbour with 1 copy of the paper (that you could all share) the P2P works because no 1 person has to worry about keeping the whole paper. You all share the burden. Does that make sense? (Hopefully in a non-patronising way!?)
||I think the simple answer is that this is already done. THe more complicated answer involves quite a lot of math and some understanding of computer science. Needless to say compression of frequently used data strings throughout the network is common, even across platforms. This is implemented by software (compression), network hardware, routers, servers, and through the entire process from source to user.
||As for the example internet page requests are "looked up" from domain name registries and replaced with numeric location information. This streamlines data traffic, actual data is negotiated using a simple numeric string, otherwise traffic to sites with long names would be significantly slowed. An unregistered domain can never supply any data, whereas an unregistered numeric address can be used to initiate almost any kind of connection.
||current network "home" of the HB: 18.104.22.168 all data requests are routed using this simple code. far superior to trying to compress the page names.
||domain names could probably be "compressed" (directly mapped, actually) to 6-bit easily.
||It would be done in the Presentation layer (re link).
Instead of sending say 100 bits of data (which could correspond with a line of code) , it would only need
to send 32 (the address ref to be looked up at the
||it really is being implemented as much as possible. You should do some research on data compression, that is the best place to start to understand the modes and limits of reduced file size.
||The technology which underpins computers and
the internet IS GENIUS. It's beautiful, so needless
to say that this '4 bytes saved' idea ... doesn't
quite fit (just that it's kind of 'there already').
Information is Cached everywhere - from 'registers'
right next to the CPU (faster than ram) to every
stage of the internet. Your browser remembers
information, the ISP (that's A.O.L) and even
google, etc. The most popular data is DEFINITELY
close to hand. Saving digits on the actual url
address is like 0.000000000000000000000000001 % of
the task. And is already greatly optimised. HTTP is
the protocol it has to travel in, that information
has to be known. If you work with computers you
see just How Much Paperwork they do.
EVERYTHING has to be written down and
addressed. By the time any of your programs have
loaded they've gone through many many many
pages of code, and written their results
laboriously down. Computers are a magical ultra-
high speed world. Consider this - a 1ghz processor
... LIGHT travels 30cm in the time it takes to do
one calculation. It takes light 1.5 seconds to get
to the moon. (240,000 miles). It took apollo 4
days. Computers are Wow. And the GENIUSES that
went into making them are perhaps even more
||[nicholaswhitworth]//GENIUS// An awful lot of it is
shoulders-of-giants <link> stuff. In fact, a lot of it is midgets
standing on the shoulders of midgets, standing on the
shoulders of midgets....
||//To get the rest of the paper you have to travel round your neighbours //
Imagine the embarrassment when they find out you read the Daily Mail.
||Common compression techniques like LZW and LZ77 already make short work of frequently used strings. They do a lot more as well. ISPs also cache commonly accessed files like the google logo.
||Of course, "at the time" was about 1974.