Half a croissant, on a plate, with a sign in front of it saying '50c'
h a l f b a k e r y
Professional croissant on closed course. Do not attempt.

idea: add, search, annotate, link, view, overview, recent, by name, best, random

meta: news, help, about, links, report a problem

account: browse anonymously, or get an account and write.

user:
pass:
register,


                               

"Irrational" compression

Being irrational about data compression.
  (+1, -5)
(+1, -5)
  [vote for,
against]

An irrational number is a number with an infinite number of decimal places without a repeating pattern. Examples are "pi", "e", the square root of 2, square root of 3, etc. Since there is no repeating pattern in the decimals, any combination of numbers is possible somewhere along the line. Meaning, any finite string of numbers is likely to be found somewhere in an irrational number.

The idea behind the irrational archiver is this: use predefined algorithms to calculate irrational numbers. Divide the string you have to compress into shorter strings (not strictly neccessairy), and compare the strings to the calculated value of the irrational number. When a match is found, the string is compressed as "the number Pi, 300 decimals, starting at the 28394059th." If the string can't be matched, the compressor tries another number. If no match is found in any of the irrational numbers defined by the compression standard, it is left uncompressed. Of course, if "most" of the string fits, the fitting part is marked, and the "deviant" numbers are matched separately.

The decompression software would calculate the irrational numbers to the farthest decimal place needed (i.e. until that iteration where the farthest decimal place needed is calculated correctly), and combine the strings into a single file.

A predefined calculation algorithm with standard variable types would ensure that the decompression software gets the same values for Pi as the compression algorithms.

The effectiveness of the compression would wary greatly, depending on the file. The ideal files would be, say, 35623730950488016887242097... (n decimal places of the square root of 2, starting at the 6th decimal place); 1415926535897932384626433832795... (the first n decimal places of Pi);

7182818284590452353602874713527... (the first n decimals of "e") and would be compressed down to a few bytes. But more demanding files might lead to the number of the decimal places being bigger than the strings themselves. The compressor should check every irrational number defined in the standard for a match closer to the decimal point.

This method would require a very powerful computer, both for compression and decompression, and both would take virtually forever, but theoretically, this could work.

Veho, Jan 07 2007

Calculating Pi http://en.wikipedia...#Calculating_.CF.80
[Veho, Jan 07 2007]

On the Mean Number of Random Digits until a Given Sequence Occurs http://links.jstor....0.CO%3B2-F#abstract
Journal of Applied Probability, Vol. 19, No. 1 (Mar., 1982), pp. 136-143
Has anyone got access to this? [hippo, Jan 09 2007]

[link]






       I had this same idea several years ago, but soon discovered the fatal flaw in the scheme: the "pigeon hole principle". Basically it will take at least as much data to encode the data as it is possible to compress.   

       To use Pi as an example; lets say there is a five digit string you want to compress - this will, on average, occur about 10,000-100,000 decimal places into Pi. Thus taking the same amount of information to describe the location of the information as the information itself (apologies for that garbled sentence).   

       The best use for Pi 'compression' is as a "one time pad" - a system of perfect encryption.
xaviergisz, Jan 07 2007
  

       //and both would take virtually forever/ / depends on how long your piece of string is......
xenzag, Jan 07 2007
  

       No, bigsleep, the GIF compression algorithm is not "very similar to this". I don't know why people keep coming back to LZW (a similarly incorrect statement is next door, in the "sliced cube data compression") - is it the only compression algorithm anyone knows?
jutta, Jan 07 2007
  

       [Lt_frank], remember any number (including decimals) can be represented as a binary number (or hexidecimal or octal), and any alphabet can be choosen to use all possible combinations in a particular base.   

       Also note that irrational numbers are inumerably infinite, so some pretty random irrational numbers can be used as an OTP.
xaviergisz, Jan 08 2007
  

       Short strings and very long strings have the same problem: their position in Pi can only be defined in a number so big, its digital representation is longer than the string itself (especially true for short strings). But the algorithm would try several irrational numbers for matches, not only Pi, and give up after a number of decimal places (i.e., if the decimal place is represented by a number longer (or inadequately shorter) than the string itself). The shortest number is selected. If no similar string can be found, the string is left as it was, uncompressed. Strings shorter than the defined minimum compression block (consisting of the code of the irrational number used, number of decimals used, and the starting decimal place) wouldn't be compressed. Very large strings would be divided into shorter ones (not a guarantee for a match closer to the decimal point; stohastic math doesn't really folow what we believe is "common sense"). This means the worst case scenario is having zero compression (end result is as big as the original), but everything else would be, strictly speaking, compressed (effectiveness notwithstanding).   

       [xav]: I should have known that, as many mathematical ideas, it has undoubtedly been thought of before, and if it was feasible, it would have been implemented somehow. I just couldn't find a flaw in my reasoning. The great thing about Halfbakery is that people tell me where I went wrong with my (half)baking (provided there really is something wrong with it).
Veho, Jan 08 2007
  

       It's more accurate to describe pi as transcendental, rather than irrational (although it is irrational too). I agree with others, that the data needed to encode where the sequence starts wil be longer than the sequence itself in the majority of cases.
hippo, Jan 08 2007
  

       For god's sake man, haven't you SEEN "A Beautiful Mind"? This research is a very slippery slope indeed, soon you'll be followed by agents and receive coded messages in the newspapers.
wagster, Jan 08 2007
  

       Edited the title for less clarity. For some reason, everyone is stuck on Pi as the only number that would be used for compression; there are many other (well known, researched, thoroughly pondered and meticulously mulled over) irrational and/or transcendental numbers to work with, and the algorithm would try several of them for conveniently placed matching strings, not just Pi.
Veho, Jan 08 2007
  

       I like the less clarity thing. And the whole problem would have gone away if the legislation to rationalize pi to 16/5 would have passed. (In Indiana, at least.)
ldischler, Jan 08 2007
  

       [Lt_Frank], I agree that using any *known* irrational/transcendental number is not a good idea for an OTP. I'm instead suggesting that there are plenty of unpublished irrational/transcendental numbers to choose from. Of course it would be easier to simply give the OTP string than try to define some obscure irrational/transcendental number - I was using this as a round-about way of suggesting to [Veho] that his scheme resembled a viable encryption system rather than compression system.   

       [Veho], having a *set* of irrational numbers rather than a single irrational number will still suffer from the same problem - it'll take valuable data to describe which irrational number you're using.   

       So for example if you ennumerate your various numbers: 1.Pi, 2.e, 3.phi etc. When you encode the data you'd need to say the 5 digit string starting at 3,7654th decimal place in the 3rd stated number (phi).
xaviergisz, Jan 08 2007
  

       [xaviergisz], the question here is whether the code for the irrational number used will surpass the size of the number representing the decimal place of the string, more than what can be gained by using more irrational numbers. The idea is that a different number would have the same string so much closer to the decimal point, it would (more than) compensate for adding a few bits to define more irrational numbers.   

       Like other compression algorithms, the biggest problem is the nature/layout of the string you are trying to compress, making for worst/best case scenarios. But while other compression methods rely on defined algorithms, this one depends on chance. Since a general string is more or less random, and irrational numbers are as well, the only way to check the effectiveness of this compression method would be to actually try it out, because the math behind a theoretical discussion of its effectiveness is hideously incomprehensible to most. There is no mathematical proof that any given (irrational) number would have a completely random string "too far" or "near enough" to the decimal point - mainly because it's random. For a theoretical proof, you would have to (statistically) calculate the distance for every set of strings, which is impossible.
Veho, Jan 09 2007
  

       Perhaps you could really make it tricky by saying
"Starting at the 123456th decimal, use x digits then subtract y" or
"Starting at the 123456th decimal, use x digits backwards then add y" or
"Starting at the 123456th decimal, use x digits and starting at the 1234567th decimal, use y digits, and then multiply together."
Ling, Jan 09 2007
  

       Lt. Frank's conjecture: Irrational numbers cannot be compressed.
ldischler, Jan 09 2007
  
      
[annotate]
  


 

back: main index

business  computer  culture  fashion  food  halfbakery  home  other  product  public  science  sport  vehicle