h a l f b a k e r y
May contain nuts.

meta:

account: browse anonymously, or get an account and write.

 user: pass:
register,

# Primary Cryption

A proposed new algorithm featuring lots and lots of prime numbers
 (+2, -4) [vote for, against]

This is a little project I've been thinking about since the early 1980s. A year or so ago I sat down and started writing code. With commentary. Lots and lots of commentary. The result can be accessed at the link below.

Originally, I happened to notice that when you take the reciprocal of a prime number (divide it into One), the result is often a rather long series of digits, before it starts to repeat. (The series and its quantity are known as "the period".) In playing around with Base Two, Base Three, Base Four, and so on, I found out that for EVERY prime number P, in at least one Base (but never all), the period has a length of P-1 digits. Try 1/17 in ordinary Base 10, and you will see a group of 17-1=16 digits that repeats. Now while we all know that in Base 10 the decimal form of 1/3 is .33333... (period of 1), it happens that in Base Two the result is .01010101... (period of 2, which is 3-1). It really is true that a period of P-1 can be found for EVERY prime number (it has been Proved) for its reciprocal in a suitable Base. It is never true for any composite number C, that its period has length C-1 in some Base.

That's OK; there are plenty of prime numbers available. (One of the files that you can obtain via the link below will create a kind of "library" of more than 200,000,000 primes.) Now it happens that the distribution of digits WITHIN a prime's period is somewhat random-looking. The official word for such things is "pseudorandom". Pseudorandom numbers are good for encryption. And if you want secure encryption, you want a lot of pseudorandom numbers....

Well, if you started with a prime that was, say, 100,000,000-and-something, then when you took its reciprocal, you could expect 100million digits in its period (in at least one Base, and actually in many). That's a nice large quantity of pseudorandom digits. Furthermore, if Base Two is used, the unique properties of that Base make the division process very very fast, compared to Base Ten. Nothing more than subtractions and "bit-shift" doublings, that is. This means that it is possible for a computer to process many prime numbers near-simultaneously, and COMBINE their pseudorandom numbers, thereby producing -- very quickly! -- a series of digits which more-closely approach the perfectly-random condition. And anyone who has ever heard of "one-time-pad" encryption knows that when perfectly random numbers are used, an encrypted text cannot be cracked even in theory, no matter how-powerful a quantum-supercomputer is used.

So, I've written a little program (compiles to a mere 71K) to encrypt files using the preceding idea (among others). Experts In The Field should test it to see if I haven't introduced any unsuspected flaws while implementing it. Everything needed to study it can be found through the link below. (The Windows operating system is assumed.) Encryption STARTS with a security level of roughly 64,000 bits, and goes up from there. Enjoy!

(I could have posted the source-code files here, but two of them are over 50K, and one is 227K, and Jutta probably wouldn't like that. :)

 — Vernon, Oct 28 2004

Primary Cryption https://sourceforge...6&release_id=278616
A .ZIP file containing both source code and executables. [Vernon, Oct 28 2004]

Jumping to conclusions may have already occured. And maybe not. Per [jutta]'s proposal, I have obliged by writing another long explanation. Then I had to amend it a little... [Vernon, Nov 05 2004, last modified Nov 12 2004]

Project Web Page http://primarycryption.sourceforge.net
Maybe you'll like the logo.... [Vernon, Nov 15 2004]

A reasonably detailed description of the algorithm http://primarycrypt...ge.net/overview.htm
Here's hoping it's tricky/sneaky enough! This may also explain why the executable is 71K. :) [Vernon, Nov 16 2004]

 Although the pattern of digits you get from the reciprocal operation is non-repeating, that in no way implies that it's random. Any portion of the reciprocal of an n-bit number will be a fraction of two n-bit numbers. Any such fraction may be uniquely identified by a any 2n digit segment of it.

I don't know the exact details, but I'm pretty certain there are some reasonable-running-time procedures to determine what such fractions are.
 — supercat, Oct 28 2004

 [supercat], thanks, and I know some of what you speak. See that "(among others)" near the end of the main text?

 For example, it turns out that the first half of the period is "complementary" with the second half. Take those 16 repeating digits from 1/17: 0588235294117647... Split in half and add: 05882352 + 94117647 = 99999999. (This phenomenon happens in every Base B, such that each digit of the sum is B-1.) So right away you really only want to use half of the period, and not all of it, as a source of pseudorandom numbers. Or get a bigger prime (infinity of them available). Or both.

 Anyway, the program does not make much use of reciprocals, except by accident. Mostly "random" prime P is divided into some other "random" number N (the "random" data is indirectly derived from a user-chosen "key file"). While it happens that the PERIOD of such a division still consists of the exact same sequence of digits as the reciprocal, the starting point of the sequence differs for each N. If the period is over 100million digits long, that's a lot of possible starting points!

AND I specified that the outputs of several divisions are combined together so that simple observation cannot so easily yield the identification you mentioned.
 — Vernon, Oct 28 2004

 I'll admit I had missed certain parts of your description, but you should be aware that the combination of multiple weak cryptosystems will oftentimes fail to produce a strong one. It's important to recognize that oftentimes combining two encryption schemes that have 'n' bits of security will yield far less than 2n bits worth of security; oftentimes the result will be closer to n+k bits, where k is pretty small (on the order of 1 or 2). Since you're talking about "64,000 bits" of security, my guess is you're talking about 16-bit divisors, which really means 16 bits of security. Even if you combine a few of those together, you're unlikely to even reach 32 bits' worth of security.

 If your divisors are on the order of 2^31, and if you can perform a 32x32 multiply with 64-bit result, then it may be possible to implement the algorithm reasonably efficiently (if your divisors are limited to 2^16 or below, as noted, I would expect the results to be cryptographically trivial). Implementing larger divisors would be possible, of course, but the computational expense would go up with the square of the size of the divisor.

What do you see as being the key size and format for this system?
 — supercat, Oct 29 2004

 If you follow the link and read the comments in the file "cryption.c" you will find all the details of what it does. But I'll try to quickly cover the main points here.

 I'm using a bootstrapping process in several steps. I do acknowledge that this is likely to be a significant weakness. Decide for yourself.... It all starts with a "key file", which can be just about any file of at least 8300 bytes. The user selects it at whim.

 Next, the key-file is examined by doing some XORs of data in it, to obtain at least 90 differently-valued bytes. (The-key file is rejected if that many different numbers cannot be obtained by accumlative XOR, but most files of the minimum length have no problem with this test.)

 Then the 90 bytes are used to create values of N (as in "the Nth prime from among 200million") to obtain 4 "control primes". Later, approximatly 1000 primes will be selected from among those 200million. How many ways are there to do that?

 The 4 control primes are tested to ensure they have periods of at least 200,000 digits in Base Two (and the key file will again be rejected --extremely rarely-- if 4 such primes cannot be thusly obtained).

 Note that in Base Two division, the result of each doubling-and-subtraction becomes the "dividend" for the next division step. So, whatever dividend is left after more than 200,000 steps is retained for the next part of the overall bootstrapping process.

 The program has a special division function that can be told how many division steps to do, and one of its parameters is a memory-pointer to the location of a dividend/divisor pair. Each division step by that pair yields one output-bit, and the "state" of the overall division is preserved in the memory until that particular pair is again chosen, to yield up a few more bits.

 The program is compiled with some initial default values for the 4 control primes. They are quickly replaced by the values described above, but the purpose of those defaults is to allow another function, PseudoRandomize(), to be called WHILE the defaults are being replaced. More about PseudoRandomize() shortly.

 One of the control primes (and its dividend) has the purpose of providing bits which are array index selection-values for those dividend/divisor pairs. However, in the initial part of the bootstrapping process, its output is more directly used (a thing which ceases just as soon as ONE dividend/divisor pair is available, besides the control primes). That is, at first it directly provides the bits which are combined with genuine data bytes, thereby causing those bytes to become scrambled.

 Inside the PseudoRandomize() function, another of the control primes has the purpose of providing bits which determine how many times a particular data-byte will be manipulated (from 2 to 5). Yes, this means the first control prime is always called upon at least twice, and often up to 5 times, to produce different sets of bits to scramble one data-byte.

 Also inside PseudoRandomize(), another of the control primes has the purpose of providing bits which determine what sort of manipulation will be done to the data byte. There are eight possible manipulations: rotation, addition, subtraction, XOR, and any-of-those-4-followed-by NOT (flips all bits). This means our byte of data gets manipulated in two-to-five probably-different ways, each time using different groups of scramble-bits.

 The fourth control prime adds another zinger to the works. Its division yields bits which control an overall loop, so that the preceding description can be entirely repeated. Sometimes it's not repeated, and sometimes it's repeated twice, but most often it repeats once. All-different scrambling-bits, modification-methods, and number-of-manipulations will occur if it is repeated, of course.

 Note that when, say, the digital version of a top-secret photograph is to be processed via PseudoRandomize(), Decryption must be the perfectly reversed process. Naturally one of the function parameters is a which-way designator. Well, during the bootstrapping process, that which-way parameter is sometimes pseudorandomly set....

 The PseudoRandomize() function is used early in the bootstrap process, to modify the 90 bytes which had been obtained via accumulative XOR. THEN those bytes are used to construct an N, for obtaining Nth primes that replace the default control values.

 As soon as the defaults are replaced, the main key file data is scanned again, processed via PseudoRandomize(), and used to construct values for more Nth primes. Groups of 64 bits in the file will yield one prime and one 32-bit dividend. Duplicate primes are mostly (but not entirely) prevented, and so the actual number of primes pulled from the minimum-size key file (8300 bytes) may be some number other than exactly 1000. The total number of these primes will depend on the length of the key file; the maximum the program is designed to work with is 65,536.

 Remember, as soon as ONE of these approximate-minimum thousand primes and its dividend are prepared, the first control prime will start using it to provide scramble-bits. As soon as two dividend/divisor pairs are available, the control prime's output will pseudorandomly select between them, as sources of scramble-data. And so on, for three, four, five... dividend/divisor pairs. It is because of the thousand pairs and the 64 bits used to select each of them, that I claim the encryption key to be approximately at least 64,000 bits. Because AFTER those primes and their dividends have been gathered, the Primary Cryption program is ready to begin encrypting the stuff you want to keep secret.

 And the first thing the program does with those 1000 primes is AGAIN replace the 4 control primes....

 During the encryption of user-specified data, one final aspect of this process is that "wasting" of pseudorandom bits is allowed. Just because all those dividend/divisor pairs in the memory are ready to provide encryption-bits, that does not mean we must use those bits as they are generated. No, the user can provide additional "skip numbers" to specify the generation-and-ignoring of a great many scrambler-bytes, before any are actually used to encrypt a data-file. Yes, this can make it harder to remember the overall encryption key (there are plans to write another program to manage those keys).

Anyway, the approximate minimum of 1000 dividend/divisor pairs each occupy a 64-bit work area, and all of them will contribute to the scrambling of any data that exceeds a few thousand bytes. So I call it "64,000-bit encryption". We might speak of "4-megabit encryption" if a long key file lets the program load 64K primes into its work-area. Yes, all those primes DO take a noticeable amount of time to load. But when being USED (each invocation can process a large number of data files), the program does its encryption work speedily enough for most purposes.
 — Vernon, Oct 29 2004

Vernon... I don't have the background to vote on this idea, but it's good to have you back. I'm wondering if the 71k program would, in fact, be smaller than this page :p
 — david_scothern, Oct 29 2004

I propose a method of encryption where a hash sum of the text to be encrypted is posted as commentary to an amateur cryptographer's idea. Then the response, which is always much longer than the criticism and pretty much random, is exclusive-or'ed with the plain text. If you need more bits, you can follow the links included in the response and also exclusive-or in the additional materials.
 — jutta, Oct 29 2004

W3EK14B2
 — Shz, Oct 30 2004

 [david_scothern], the 71K file is the executable. Machine language, not very English-like.

[jutta], That was both naughty and nice at the same time! Good show!
 — Vernon, Oct 30 2004

 Vernon: I've designed and implemented a few different encryption schemes for embedded processors. My schemes are probably pretty bad because I lack the full expertise to ensure that my transformation functions are actually orthagonal (many of them were a combination of QBASIC's rnd function and some arbitrarily-chosen swaps) but on the other hand they can be effectively implemented in about couple hundred words of code space and a few dozen bytes of RAM.

It's been awhile since I've looked at newsgroups, but if you can read sci.crypt I think you should find some useful information there.
 — supercat, Oct 30 2004