Halfbakery: Factoring Algorithm, Extended

This Idea directly extends my earlier "Factoring Algorithm" post (linked). There is a decent chance that this extension of that Idea can equal or even exceed the efficiency of the General Number Field Sieve (GNFS), currently the best publicly available factoring algorithm.

One thing that makes the GNFS better than ordinary factoring algorithms is the fact that it doesn't limit itself to ONLY looking for the exact factors of 'c' (the composite number that you want to factor). It happens that various multiples of those factors can, if any one of them is found, also be used to find an exact factor of 'c' (by plugging it and 'c' into the well-known "greatest common divisor" algorithm).

Well, the algorithm I previously presented also doesn't have to be limited to seeking only the exact factors of 'c'....

The example I'll pick, to explain how that Idea can be extended, is the number 1234553. Its 'rough square root' or 'r' is 1111, and the relevant 'nodulus' (square 'r' and subtract it from 'c'), 'n=232'. It takes a decent quantity of trials to find the factors of 1234553, even when we ensure that all trial values, generated from '(r)(y-x)-n', are exactly divisible by 4 --the expression '(y-x)' must equal 124 (about 31 trials).

However! Suppose we multiplied (1234553)(2), and applied the factoring algorithm to it? If the factors of 'c' were originally 'a' and 'b', then this new number will have factors of 'a' and 'b' and 2. But we are not required to think of them as being three factors, we can pretend one of the factors of this new 'c' is '(2)(a)' and the other factor is 'b'. OR, we can pretend the factors are 'a' and '(2)(b)'. The algebra doesn't care, and it is actually simpler to ignore the location of the '2', and just use variables 'a' and 'b' as before --one of which will be an even number. So, if we are still working with only two factors, we can apply the previously presented factoring algorithm. What are the results?

The product, 'c=2469106', lets us compute 'r=1571' and 'n=1065'. Note that since one of the factors of this 'c' is an even number, we can't use the divisible-by-4 trick, so we have to begin our '(y-x)' trials with the value of 1 (because either 'x' or 'y' will be an odd number, if 'a' or 'b' is an even number).

The first trial value is computed as '(r)(y-x)-n=t', where 't' also needs to equal '(x)(y)'. Converting variables to numbers, we have (1571)(1)-1065=506='t'. Next, we compute the square root of 506, because if 'x' and 'y' are only different from each other by 1, this is a fast way to find the approximate value of either 'x' or 'y' (recall that if an arbitrary value of 'r' is chosen, such that '(y-x)' could equal Zero, then, if true, 'x' and 'y' will both exactly equal the square root of 't'). It should be noted that when doing many iterations of the algorithm, it can become possible to adjust 'x' and 'y' without figuring the square root more than just the first few times; a mathematical series of adjustments can be discovered.

The square root of 506 is 22-and-an-irration, and It happens that 506=(22)(23); it is the product of two numbers that differ by 1. COOL! By previous definition, the smaller number must be 'x'. So we take the definition 'x=r-a' and compute 'a=r-x' or 'a=1571-22=1549', and we can also take the definition 'y=b-r' and compute 'b=r+y' or 'b=1571+23=1594'.

Simple multiplication proves that (1549)(1594)=2469106, and therefore 1549, at least, must also be a factor of the original number 1234553. We can also divide 1594/2=797, which must be the other factor (or we could divide 1234553/1549 to obtain it). Thus we have factored our original 'c' on the first try, instead of on the 31st.

Now, WHY did that work? Basically, it worked because the ratio of 1549 to 797 is approximately 2-to-1. The factoring algorithm tries to find two values that are similar in magnitude to 'r', the rough square root of 'c' --or to whatever arbitrary 'r' is chosen-- so the more different are 'a' and 'b' from 'r', the more trials it can take to find them. Well, if 'a' is about half the size of 'b', then multiplying '(2)(a)' yields a value that is very similar in magnitude to 'b', and so the 'r' of a 'c' created from all three multipliers should be --and was-- similar in magnitude to both the 'a' and 'b' that the algorithm sought, which made 'c' factor-able more easily. As a double-check of that explanation, I will now create 'c=(1234553)(7)(13)', and work the algorithm on that --it is obvious, isn't it, that 13 and 7 also have, roughly, a 2-to-1 ratio? So '(13)(a)' should be similar in magnitude to '(7)(b)'....

When 'c= 112344323', 'r' can be 10599 and 'n=5522'. Since we constructed our new 'c' using odd multipliers only, we can use the divide-by-4 trick, and it happens that 5522 is not divisible by 4. So we compute (10599)(2)-5522 and obtain the actual first trial value, 't=15676', which indeed is divisible by 4. It happens that there are no good values for 'x' and 'y', integers differing by 2, that can multiply to equal 15676.

So we compute the next trial value by figuring (4)(10599)=42396, and adding 42396+15676=58072. We could also get that result by computing (10599)(6)-5522, but the other way is simpler when doing it many times, because the net effect is that we "constant"ly add '(4)(r)', or, in this case, 42396. However, we do have to keep in mind that each adjustment is associated with an increase of 4, in the difference '(y-x)'.

It happens that 58072=(238)(244), two numbers that differ by the currently desired value of 6. And therefore 'a=10599-238=10361', and 'b=10599+244=10843' --they are indeed factors of 112344323. As with the General Number Field Sieve, either of these factors can be plugged into the Greatest Common Divisor algorithm, along with 1234553, in order to discover a factor of 1234553 ... 10361=(13)(797) and 10843=(7)(1549).

So, even though the 'c' we started with was considerably larger than before, we have still found its factors on the second trial --BUT ONLY BECAUSE we knew in advance that the ratio of the factors of 1234553 was approximately 2-to-1. That's the rub! What if we don't have any such information (the normal situation)?

So, now we inject Permutation Power into the algorithm....

As an exercise for the reader, consider the number 1234567. I invite you to multiply it by 77 before applying the algorithm to it. You surely know that 77=(7)(11), but does that mean the ratio of the factors of 1234567 are roughly 11-to-7? What if they were about 77-to-1? (Those two different possible ratios are "permutations" associated with the multipliers of 11 and 7.)

The point of the preceding Question is, you don't know! (Heh, you don't even know if I'm sending you on a wild-goose chase, when I suggested 77 as a multiplier! I suppose you could try using 75 instead of 77; its ratio permutations are 75-to-1, 25-to-3, and 15-to-5 --exactly 3-to-1.) But the algebra behind the algorithm doesn't care about dinky details like that! It simply works with the numbers fed into it, with the simple goal of seeking two factors that happen to be closest in magnitude to 'r', usually the rough-square-root of 'c'.

Now to Extend the preceding in a Big Way. Suppose you have a number 'c' that is 1000 digits long, that you want to factor. It happens that modern computer technology can easily and quickly manipulate numbers that are a million digits long, or even ten million digits long. Suppose you took that 1000-digit number and multiplied it by A LOT of smaller numbers?

You would create a new number 'c' that is maybe five million digits long. There are VAST quantities of permutations, ways to organize all those multipliers (including, of course, the original two factors of your 1000-digit 'c'). And all you need is JUST ONE PAIR of grouped multipliers, factors of your five-million-digit 'c', such that they are both similar in value to 'r' --or, if you exlude the factors of your 1000-digit 'c' and think about those groupings, you want their ratio to be approximately equal to the ratio of the factors 'c'. The algorithm can fairly efficiently find that pair.

Now, there are a few complications that must be mentioned. First, the experiments I've conducted along this line indicate that the huge collection of multipliers must include large numbers (say 300 digits) as well as small numbers like 7, and a variety of in-between-sized numbers, also. I strongly recommend EXCLUDING the number 2 (and all other even numbers). That way the huge new 'c' is guaranteed to be an odd number, and the divisible-by-4 efficiency trick can be used.

Next, the bigger the final huge 'c', the more accurately the ratio of the two groupings of multipliers must equal the ratio of the factors of the original 1000-digit number. It can still take a lot of trials to find some factors of that huge 'c', which is why this Extended Algorithm may not, on the average, be much more efficient than the GNFS. On the plus side, though, a simple change of one single multiplier --say you remove 797 and use 1549 in its place-- means that ALL the possible permutation-ratios just got changed. Maybe this newly-adjusted overall collection of multipliers will include two groupings that more accurately match the ratio of the factors of your original 1000-digit 'c'.

Finally, it is perfectly possible that when two factors of that huge number are found, one of them will contain, as one of ITS factors, the whole original 1000-digit number that you really wanted to factor in the first place. In this case you CANNOT use the results to factor it. That's a sort-of luck-of-the-draw (or Murphy's Law) thing; on the average it should only happen about half the time. So, "If at first you don't succeed, try, try again!" And remember, when you do find a couple of useful factors of that huge number, the Greatest Common Divisor algorithm means you need not be concerned about the specific groupings of multipliers comprising those factors. It will very quickly pull out just one that matters, a factor of your original 1000-digit 'c', which is the penultimate goal here (the ultimate goal, of course, being achieved by dividing that number into that 'c' and getting its other factor).

Actually, there is a way to significantly reduce the chance of that last problem happening (one factor of the huge number includes the entire original 1000-digit 'c'). Imagine constructing a number from a lot of factors, with the result being approximately equal to your 1000-digit 'c'. If you multiplied it against 'c' and then took the square root and followed the algorithm, the most natural result-factors of that large product are your original 'c' and the constructed number. But what if you constructed two numbers approximately equal to 'c', and multiplied all three together, before taking the square root? Very likely the two large factors of your 1000-digit 'c' would get split between the results found by the algorithm. So, when constructing your five-million-digit 'c', simply assemble some EVEN quantity of constructed numbers, each approximately equal to your 1000-digit 'c', before multiplying all of them --and that original 'c'-- together.