h a l f b a k e r y
Crust or bust.
add, search, annotate, link, view, overview, recent, by name, random
news, help, about, links, report a problem
or get an account
The concept of 'carbon footprint' is a useful way of
attention to profligate energy wastage (or, conversely,
boasting of one's conspicuous consumption).
However, words also have a cost. They cost energy and
resources to print on paper, to store on servers or to
The more people access those
words, the more their energy costs are multiplied.
It should be possible, therefore, to account for these
in the same way that we can tally up the carbon cost of
guacamole or of a skiing holiday.
Take, as an example, the post you are reading now. We
might estimate that N people will read it over its
meaning that it will bounced around between N*S
computers, consuming an amount E of electricity in
transmission and display. It will also occupy X% of the
server where it is being stored, hence costing X% of the
total energy costs of that server. It will also consume so
many seconds of so many people's time to read, and
Z% of the total HB information) can be said to consume a
certain very small percentage of Jutta's time in
maintaining the HB. And so on an so fifth.
Thus, we can calculate the energy cost of this post.
Now, had I been more brevious, I could have told you all
this in fewer words - perhaps only one third as many. I
have therefore wasted a considerable amount of energy
through my verbitude. This wastage can be captured as
"jargon footprint" which (if I could have said the same
thing in 1/3rd as many words) would in this case be 3.0
The 'jargon footprint' can, like its carbon cousin, be
applied. For example, a government office might, by
issuing overly verbose documents to large numbers of
people, score a jargon footprint of 10 or even 20. A sign
which I recently saw on a public convenience (which read
"We regret that these toilets are temporarily unavailable
for cleansing - please use alternative facilities" - 14
would score 4.6 (the relevant message could have been
put as "Closed for Cleaning").
Individuals can, on the basis of their lifetime output of
words versus information, be assigned a jargon footprint.
So can entire disciplines. Sociology, for example,
has a jargon footprint of over 100, since it uses many
words to convey almost no information; physics probably
has a far smaller footprint.
Even languages can be assigned footprints. Comparison
of multi-lingual signs invariably shows that some
languages are intrinsically more concise than others. No
doubt the French are particularly guilty in this respect.
Nations, organizations and individuals should all strive to
reduce their wordage, driven by this common metric.
Blimey, I do go on, don't I?
||As soon as you used "brevious", you made the sig at the bottom redundant.
||Ah, but in Chinese, more bits per character are
needed than in a Western language with a proper
writing system. Also, compression is less easy.
||You didn't seem to factor in the additional footprint of annotations, and any subsequent ideas this one may inspire that otherwise would not have existed.
||Think about the jargon footprint of Socrates, who being concerned about his own didn't write a thing. He should have been more careful about his carbon dioxide footprint, and kept his thoughts to himself.
||This is very related to information theory; seems to me that the Jargon Footprint is usually called redundancy.
||Bah humbug. Redundancy, like anything else is relative. Take DNA - If we wanted to compress the amount of information present in a single strand of DNA, I'm sure there's plenty of really good algorithms out there that would help sort things out. Then there's the encapsulation problem. Why does the body need to have a billion-billion copies of the same piece of code? Why not reorganise the whole thing on a centralised model where a single, secure repository is responsible for holding a single master copy of the information and have each cell reference that master directly? It's like having each word in the Complete Works of Shakespeare being printed using a really tiny list of all the words (in sequence) present in the Complete Works of Shakespeare - completely unnecessary and having an informational footprint of many bazillions.
||//some languages are intrinsically more concise
||How does German score here? They tend to take
about 5 words and bung them together as one, so
although the length of any piece of writing might
the same as in another language, the Germans
should generally have fewer words.
||//How does German score here? They tend to take
about 5 words and bung them together// Strictly,
we ought to count characters, in which case this
historically-rooted attempt by the Hun to pre-
emptively get around my jargon footprint penalty
scheme is thwarted.
||The thing is, [bigs], that Chinese isn't really proper
writing, it it? They just made do by drawing stuff
and, over the years, the drawings degenerated
like a bank manager's signature. At best, written
Chinese conveys the general gist of whatever it
was you actually wanted to say, as long as you
interpret it with hindsight.
||This is why we in the West, who took the time and
trouble to work out the whole writing thing, can
say pretty much anything we want using just 26
characters (or 25 in Norfolk, where they don't use
"o"), whereas the poor benighted Chinese have
had to come up with several thousand different
cartoons. Written Chinese is like a cross between
Pictionary and charades ("bird that makes a noise
like a carrot" - "parrot").
||They've clearly put a lot of effort into the
descriptions of monkeys juxtaposed with chests of
drawers, but how often does this situation arise?
||To make the situation fairer, the carbon footprint
of every language can be calculated as the
character count of text (relative to a page of
English) multiplied by the number of characters in
the "alphabet", divided by 26. Thus, English would
have a score of 1*1=1. Chinese might have a score
of 0.7*(1118/26)=30. Binary, as another example,
might have a score 5*(2/26)=0.39.
||Regarding compression algorithms, I suspect that
all languages can be compressed by roughly the
same proportion, given that some
characters/symbols will be more abundant than
||Regarding //Given that we can remove a fair
degree of complexity from chinese characters
discounting the monkey and chest of drawers
glyphs I'd say its still a winner.// I disagree. How
many Chinese cartoons do you need to be familiar
with to read a newspaper*? Surely many more
than 52. Thus, if Chinese uses only half as many
characters as the English equivalent, it has a larger
||How about an experiment? I took a page of prose
in English, and did automatic translation into
Simplified Chinese. I then tried compressing (by
ZIP) both files. The results (in bytes, before/after
||So, Chinese gives a bigger file, whether
compressed or not. All depends on encoding,
compression algorithm and text length but,
broadly speaking, Chinese cartoons are lengthier
than Proper English.
||Always been impressed with the economy of the ancient people of northern Britain. Very unwindy, they became well known in our time for their signage outside public restrooms.
||I can't cite for this but i think languages tend to have distinctive compression ratios. With Mandarin Chinese, there are a fixed number of possible spoken syllables which they are very attached to thinking of as words. Just as German chooses to capitalise certain words to distinguish meaning which it can get away with not shouting when we speak it, e.g. "Sie" versus "sie", so Chinese chooses to use a large number of ideograms for the same spoken words. Compress spoken Chinese and each "word"/syllable takes up less space. If Mandarin has five hundred possible syllables ignoring tone and four possible tones, that's eleven bits per morpheme and perhaps word. English has forty-four phonemes and the mean length of a word is around five phonemes, so the "average" uncompressed spoken English word considered as a string of speech sounds rather than sounds of a more general kind (remembering where we are and who started this place) is twenty-five bits "long". However, those words frequently consist of several morphemes, and Indoeuropean languages tend to encode more "units of meaning" per morpheme than non-IE ones, which are often either agglutinative or have isolated morphemes each considered to be words. Therefore, we Anglophones have compression built into our speech, unlike Mandarin, but interestingly, our writing system tends towards a system of ideograms masquerading as an alphabetic script, so the compression ratios of our speech and writing, like Mandarin's, are probably quite different. Also, exactly how small is a single atomic idea?
||//"encircle Wèi to save Zhào"... this 4-character
summary is sufficient to make the point.//
||Well, that's true enough except that it isn't, of
course. You can tell me to encircle Wei to save
Zhao all you like and, frankly, all you'll get from
me is a blank stare because the statement
presupposes a lot of specific knowledge.
||On the other hand "Buy low, sell high" _is_ a self-
||Ah, but that was _my_ point in the original post.
You measure the length of a given piece of text, and
compare it to the minimum length needed to
convey the same information to a similar audience.
||Thinking in terms of mechanical efficiency its cutting out noise, friction, reducing mass, shaping something to fit into the right hole, stuff like that. "Buy low, sell high" doesn't really need knowledge of advanced markets to make sense. Buy something when the value is low and sell it when the value is high. Extrapolating on the meanings of everything may eventually produce quite a lot of information. Value is certainly a complex concept, and by cutting out that word and implying its meaning in 'low', and 'high' then some market sense is built in. An agrarian person might think buy seeds and soil, physically low things, and sell fruit when it is high, physically high ontop of a tree or plant. But then of course those physical things have a value that is not literal. It should be confusing because even in terms of social stratification something nonliteral like stratification is interpreted as a literal hierarchy, but really the evaluation is much more complex and may only be coincidently physical like a homeless person sleeping low under a bridge, and a wealthy person high in a penthouse. More to the point about mechanical efficiency the noise isn't necessarily implicit in the language but a noise of referents that a person might use to make sense of something. If someone says "look at that car" that is a pretty simple message, but if there are many cars such as on a highway the noise comes from outside, and the message was just an exclamation and not a suggestion, but what if there is only one car and you're in a small garage then the clear message is really a senseless noise. So that is also somewhat of a point that perhaps some noise was required to reach, that jargon is perhaps not the only source of noise but confusing points of reference as well, literal and nonliteral ones.
||//how do you measure the efficiency of a
speaker/writer if efficiency almost entirely
depends on the cognition of the listener/reader?
||You can take a decent stab at it.
||For example, the phrase "La plume de ma tante"
contains the same information as "My aunt's pen",
and relies on the same background knowledge in
the reader. But the French phrase has 20
characters (including spaces) whereas the English
phrase has only 12. Hence, by this limited
measure, French has a footprint of 20/12, or about
||There's also "LO!", although that's sort of tonal.
||Another way to approach this would be to think in terms of how much something can be exploded and how much it can be imploded. "Buy low; sell high" when taken as meaningful words, as opposed to destroying it in a mass of meaningless referents as above, can be exploded to the definitions of each word. If someone took care and time to do this exploding of each word, for example defining buy, defining the words that define buy, defining the words that give meaning to those words etc. then quite a long treatise could be produced on rational exchange and value, and things of that nature, probably not without some difficulty using various dictionaries. That treatise could be read and someone could implode it down to "buy low; sell high" and this is something that could be tested if someone were to perform such an undertaking, or undertake such a performance.
||Think about something known to be absurd that cannot be exploded "you only live once". Only and once are the same word, and you and live refers to the same life. Basically it can be imploded down to "once" and the explosion that takes place is in the absurd mass of "one" justifying "all". But of course it is an absurdist philosophy, where no explosion of meaning can take place, and adherents simply bask in the meaninglessness. They can't verify the claim, only the past certainty that authority was without god. Camus pour le chameau. A more meaningful person without so many humps would probably say something like "live each day to the fullest", fullest providing for a fullness of meaning, and each day one of a multiplicity each with a more meaningful fullness than the last, or "sieze the day" with a purposefulness that transcends meaning.
||You'll have to consider the acceleration of the JF in
time, as more and more lengthy annotations are
written. On the other hand, there is a threshold past
which most people don't read. And with annotations,
sometimes if they are too long they'll be skipped