Half a croissant, on a plate, with a sign in front of it saying '50c'

h a l f b a k e r y
Bone to the bad.

idea: add, search, annotate, link, view, overview, recent, by name, best, random

meta: news, help, about, links, report a problem

account: Browse anonymously, or get an account and write.

User:
Pass:

or Create a new account.


                   
Please log in.
If you're not logged in, you can see what this page looks like, but you will not be able to add anything.



Almost uncorrupted files
Now compression tools are powerful enough, bio-engineer the ASCIent code!
  (-1)
(-1)
  [vote for,
against]


ASCII code translates numbers in the range 0-255 into symbols (most of which are used seldom).

The genetic code (used by most living beings) translates DNA bases (4-letter alphabet, namely: A, C, G, T) into aminoacids (20-letter alphabet of proteins).

4x1 = 4. No use. 16 aminoacids wouldn't be encoded. 4x4 = 16. Humpf! Almost there. Still 4 'unspeakable' aminoacids. 4x4x4 = 'base triplet' or 'codon' = 64. More than enough!! The technical word for 'more than enough' is 'degenerate'. Some aminoacids are encoded by a single codon, some of them by two codons, etc, up to six.

This means the first base in the triplet is 'essential', while the second and the third one are often 'redundant'. They may accept a mutation and still code for the same aminoacid.

Imagine doubling the necessary numbers in the standard ASCII code, from 256 to 512. Useless! A waste of memory/disk room! Yeah, all of that, and more. You don't want a binary 100 0111 (capital G) single-bit corrupted into a 101 0111 (capital W), and you definitely won't accept a 100 1000 -> 100 0000 (H -> @) change in your texts. There would be several good rules to apply to the new code, such as providing highly used characters with more (synonymous) binary numbers. For compression purposes, efficiency loss could be minimized by choosing the appropriate encoding so that code redundancy is put to use. If you can't compress the first binary number for character C1 along with the first binary number for chararcter C2, then try alternative encondings until compression is optimal or time cost is unaffordable.

Double the amount of needles and the amount of hay in the stack: any magnet will be able to pulls both needles in less than twice the required time for a single one.


mayihave, Jul 25 2007

Wikipedia: Forward error correction http://en.wikipedia...ror-correcting_code
[jutta, Jul 25 2007]



Annotation:







       As far as I'm concerned computers are and will remain absolutely useless until they learn to perform internal operations in plain English.

nuclear hobo, Jul 25 2007
  

       my humpf! my humpf my humpf my humpf!

bungston, Jul 25 2007
  

       ASCII what you can do for your computer, not what your computer can do for you.

xenzag, Jul 26 2007
  

       I just had wonderful mental images of computer internals of various nationalities.   

       "No, don't be interrupting me, listen what I'm telling, this is a matter for the accountant only"   

       Disclaimer: Any nationality assumed from the above text is the reader's responsibility.

marklar, Jul 26 2007
  

       Really interesting link, the one about forward error correction. Thank you for that.   

       Humm, as for compression yield, of course non-optimal compression would be achieved by this method. The point I was trying to make is that living organisms use redundant codes (such as the genetic code) in order not to lose information even after billions of replicating generations. Error rates in DNA copying vary from 10 to the -7th power to 10 to the -10th power (that's one thousand to one million times better than an average book proofreading outcome), and that's why organisms such as bacteria can copy their entire genetic information with an almost lossless transmission (never intended to compress anything). Anyway, from what you showed me, it's plain to foresee engineers will keep their jobs for a long while, because of their ability to tell the difference between 'signal' and 'noyse'.   

       The second point, if ever there was a chance for it, was hummm, well, how good would zip performance be on that extended redundant code? And the answer, 'not as bad as expected', because of redundancy itself.   

       Thank you all for discussing.

mayihave, Jul 26 2007
  

       1) The range of ASCII is 0-127 2) ASCII sucks anyway 3) Use RAID

ironfroggy, Jul 28 2007
  

       Well, I don't understand half of the ideas above, hummm. Ever since the 'message science' begun, people have been tryng to figure out what to do or how to encode things out. An ancient Chinese Emperor ruled there should be 80,000 ideographic-syllabic characters at the most, because Chinesse was almost impossible to learn already. What the... do people mean by comments such as //ASCII sucks anyway//? Of course it sucks, man! Unused characters suck more than frequently used characters. (I'll give you an excerpt of Harry Potter's character frequencies at the bottom of this comment.) Just bear in mind that there are only 58 different symbols in that book; and the top 30 most frequent characters make up >99% of the volume's 1,367,463 bytes. In fact, >51% of those bytes code for one of the top 6 characters in the ranking. Mini-micro-tiny-ridicule 'ascii-6' would 'explain' half the goddamn book!! Sounds like an outstanding statistics to begin with, doesn't it. I'm already half-baking a second-order Markov model so that frequent character pairs (such as 'th') could be 'condensed' into shorter codes.   

       Char. Freq. char. freq.   

       Blank 0,176 w 0,018 e 0,104 c 0,015 t 0,063 f 0,014 a 0,058 . 0,013 o 0,056 m 0,012 h 0,055 ‘ 0,012 n 0,051 b 0,011 i 0,050 p 0,010 s 0,050 , 0,010 r 0,049 v 0,010 d 0,039 k 0,008 l 0,028 H 0,008 g 0,026 I 0,003 u 0,024 T 0,002 y 0,019 K 0,002

mayihave, Jul 28 2007
  


 
back: main index
 business 
 computer 
 culture 
 fashion 
 food 
 halfbakery 
 home 
 other 
 product 
 public 
 science 
 sport 
 vehicle