Halfbakery: Factoring Algorithm

A "factoring algorithm" is any procedure designed to solve this problem: "Given the multiplication equation '(a)(b)=c', and if you are provided with the value of 'c', find the values of 'a' and 'b' (the 'factors' of 'c')". It is not always necessary find ALL the possible factors of 'c'. So, for example, if 'c' is an even number, then you can automatically know that one of its factors is '2', and another factor will be 'c/2'.

This Idea is about a factoring algorithm that, so far as I know, is not widely known. It is slightly more efficient than another algorithm that is fairly widely known (at least among those who like mathematics), and is called "Fermat Factoring", after the same guy who thought up "Fermat's Last Theorem", which for something like 300 years was one of the most famous unsolved problems in all of mathematics (link).

The factoring problem is actually more than 2000 years old, because in all that time, there was no truly efficient algorithm ever found for solving it. All known algorithms (at least, known to me at this writing) can take a very long time to solve the problem, if 'c' is a very large number.

It should be pointed out that certain modern encryption procedures depend on the fact that the factoring problem has no known efficient generic solution, which therefore means that if someone finds such a solution, it will likely be kept secret as long as possible, by any organization that can benefit from cracking messages that other people think are securely encrypted.

The Fermat Factoring algorithm is fairly efficient on smallish numbers. Because this Idea's algorithm is somewhat related to Fermat Factoring, I'll describe it first. Please note that the first few paragraphs below are relevant to both Fermat Factoring and to this Idea's algorithm, so you might want to read them before skipping stuff you might already know about Fermat Factoring.

Let's first assume that 'a' and 'b' are odd numbers (because as mentioned above, the problem has an almost-automatic solution if 'c' is an even number --and 'c' WILL be an even number if either 'a' or 'b' is even).

Next, we will assume that 'a' and 'b' are DIFFERENT numbers, which means that one of them will be smaller than the other. Let us assume that 'a' is smaller than 'b'. If they were the same number, then 'c' would be a "perfect square number" and one of the standard preliminary things done, when trying to factor a number, is to compute its square root to find out if it is a perfect square, so 'c' would always be quickly factored in that case.

If 'a' and 'b' are odd numbers, then their sum 'a+b' must be an even number, and if we compute '(a+b)/2' we will obtain another number (it might be either odd or even) which is exactly midway between 'a' and 'b' --it is the "average" of 'a' and 'b'. Let us call this number 'm' (for "midway").

Given that 'a' is smaller than 'b', and that 'm' is midway between them, it logically follows that we could compute either 'm-a=x' or 'b-m=x' --and we can be sure that 'x' will be the same value in either case.

Well, since those two simple equations can be rewritten as 'a=m-x' and 'b=m+x', it logically follows that if '(a)(b)=c', then so also must '(m-x)(m+x)=c'.

We now algebraically figure the results of that last equation, and end up with '(m^2)-(x^2)=c'. What Fermat discovered was that if 'c' can be factored, then it must be a number which is the difference between two perfect squares. This can greatly reduce the amount of trial-and-error that must be done, to discover what values of 'm' and 'x' work for a given 'c' --and once they are found, of course, 'a' and 'b' can be computed easily. (It even works if just one of the two factors of 'c' is an even number, but you have to be willing to play with fractions, when doing the math.)

There is a fairly natural way of doing Fermat Factoring with a bit of efficiency, by paying further attention to odd numbers versus even numbers. It happens that you only need to try differences between odd squares and even squares, when you want to factor an odd number. In any given finite range of integers, only about half the possible pairings need to be tried. The other pairings (subtracting an even square from an even square, or an odd square from an odd square) can be ignored.

Now for the Idea that is the reason for this HalfBakery posting. It has a couple of variants, with the first being the simplest (least sophisticated) and the last being the most efficient, so I'll describe them in order (but the first is background information for the last, so I won't have to repeat much stuff here).

The first thing we do is compute the square root of 'c' and, if that didn't directly reveal its factors, throw away the "fractional" (actually, "irrational"!) part of the square root, and keep the integer part, and call it 'r' (for "rough root", of course).

It happens that 'r' will always be a number that is smaller than the value 'm' computed for Fermat Factoring. However, USUALLY it will be larger than 'a' and smaller than 'b' --though sometimes it will actually equal 'a'. This means that a secondary test should always be done once 'r' is computed, and that test is to compute 'c/r' to find out if it solves the problem. An example of this situation is 'c=63'; its square root is 7-and-a-an-irration, so 'r=7' in this case, and it should be known to the reader that 63 is exactly divisible by 7.

Once we know that 'r' is greater than 'a' and smaller than 'b', even though not midway between them, we can compute 'r-a=x' and 'b-r=y', and, USUALLY, 'y' will be larger than 'x' (I'll examine the case where they are the same in a little while). As before, simple rewriting of those equations yields 'a=r-x' and 'b=r+y', and therefore '(r-x)(r+y)=c'.

The algebra associated with manipulating that last equation is a little messier than what Fermat did, but here is what I want to show you: '(r^2)+(r)(y-x)-(x)(y)=c', which can be rewritten as '(r)(y-x)-(x)(y)=c-(r^2)'.

I will now take the liberty of defining 'c-(r^2)=n', because, since 'c' and 'r' are both known values, 'n' can be easily computed. The larger equation can now be simplified a little, and rewritten as '(r)(y-x)-n=(x)(y)' --THIS is the "factoring equation" upon which this Idea's algorithm is based.

To see how it works, let's pick 'c=77', from which we can easily figure 'r=8' and '77-64=n=13'. We don't know what 'x' and 'y' are, but we DO know that the factoring equation to use includes the expression '(y-x)', and, because 'y' is usually greater than 'x', its value will probably be 1 or more. Obviously we should start our search using 1.

The factoring equation now becomes '(8)(1)-13=(x)(y)', or '-5=(x)(y)'. I shall call -5 a "trial value" in this overall description. Possible values for 'x' and 'y' are 'x=-1' and 'y=5' or 'x=-5' and 'y=1', but this can't work BECAUSE OF A CRUCIAL CONSTRAINT, which is the assumption in the last paragraph that 'y-x=1' (and you can easily see that in both of the preceding possibilities, 'y-x=6'). It is that constraint which gives this algorithm its power, because only a couple of values of 'x' and 'y' need be examined for each assumed value of '(y-x)'.

Trying the next value 'y-x=2', we get '(8)(2)-13=3=(x)(y)' (with 3 being our new trial value), and so possible values for 'x' and 'y' are 'x=1' and 'y=3' --and here we can easily see that for those values, 'y-x=2', indeed! This means we can go back to the earlier equations 'a=r-x' and 'b=r+y', and compute 'a=8-1=7' and 'b=8+3=11', which indeed are the factors of 'c=77'.

The preceding was a very simple example and often many trial values have to be tested. I want to clarify what I wrote about only a couple of values of 'x' and 'y' need to be tried, for each trial value. Start by calling the trial value 't'. While we want the product of two numbers 'x' and 'y' to equal 't', quite often their product will be something different.

As an example, suppose that, during some factoring attempt, 't' was computed to be 117 from the assumption that 'y-x=2'. Well, 9 and 11 are separated by 2, and their product is 99 (less than 117) --and 10 and 12 are separated by 2, and their product is 120 (greater than 117). ALL other combinations of 'x' and 'y', separated by 2, will yield a product that is MORE different from 't' than the 99 and 120 computed in this example! Which means all of them can be ignored; it is necessary to try a different assumption for the value of 'y-x', and to compute a new 't' (and again only a couple of values for 'x' and 'y' need be examined).

Getting back to the main topic, it was previously indicated that this algorithm can be made somewhat more efficient. We start by recognizing that the algebra doesn't really care what the value of 'r' is; the equations will still work. So suppose we picked 'r=9' instead of 'r=8', for the case of 'c=77'....

The initial result is that '77-(9^2)= 77-81=-4=n', and our factoring equation becomes '(9)(y-x)-(-4)= (9)(y-x)+4=(x)(y)'. When we choose an arbitrary value for 'r', we must be more alert to the possibility that (y) might equal (x), or even be smaller than (x). (That is, if we knew in advance the values of 'a' and 'b', and used them with our arbitrary value of 'r' to compute 'x' and 'y', the results could significantly differ from what has been previously stated, that 'y' is usually larger than 'x', when 'r' is directly linked to the square root of 'c'.) In this case, because the arbitrary change (from 'r=8' to 'r=9') was small, we can start with the assumption 'y-x=0' ('x' and 'y' are the same number).

The result from our factoring equation is that '(9)(0)+4=4' (the trial value is 4), and if 'y-x=0' or 'x=y', then their values need to be equal to the square root of 4. That is known to be 2, which means that the problem should be solved, and, indeed, 'a=9-2=7' and 'b=9+2=11'. It is examples like this that reveal the relationship between this algorithm and Fermat Factoring, since 77 is, indeed, the difference between two squares: (9^2)-(2^2).

Where, however, was the promised increased efficiency? Ignoring the obvious fact that the immediately preceding variation of the algorithm got the answer on the first trial value of '(r)(y-x)-n', and not the second, the answer to that question actually comes from paying attention to odd numbers versus even numbers. To maximize the efficiency of this algorithm, we ALWAYS want 'r' to be an odd number (therefore, about half the time, 'r', as figured from the square root of 'c', must be arbitrarily adjusted to become an odd number).

If 'r' is odd, then its square will also be odd, and so, if 'c' is odd, 'c-(r^2)=n' will be an even number, and, because 'a' and 'b' will also be odd if 'c' is odd, it logically follows that when we compute 'x' and 'y' from 'a' and 'b' and 'r', the results will also be even numbers. Now let us examine how those facts fit into the generic factoring equation '(r)(y-x)-n=(x)(y)'.

First, '(y-x)' will be an even number, and therefore '(r)(y-x)' will also be an even number --and, of course '(r)(y-x)-n' and '(x)(y)' will both also be even numbers. This means we never need to try values of '(y-x)' which are odd.

MORE, since both 'x' and 'y' are even, their product is actually also guaranteed to be divisible by 4, which means that we only need trial values such that 't=(r)(y-x)-n', and also are exactly divisible by 4. It is quite easy to create such a series of trial values, for comparing with the product '(x)(y)'. Simply start with 'y-x=0', and if '(r)(0)-n' --that is, '-n' alone-- is not divisible by 4, then start with 'y-x=2', and '(r)(2)-n' WILL be divisible by 4. Once you have an initial trial value that is divisible by 4, you simply add '(4)(r)' to it to get the next trial value, and the next one after that, and so on.

Reducing our brute-force search by 3/4, of numbers to try, should count as an efficiency increase! Meanwhile, Fermat Factoring still needs to try half of the possibilities, so that is why this algorithm is a little more efficient.