9.4 Factoring – Introduction to Cryptography with Coding Theory, 3rd Edition

9.4 Factoring

We now turn to factoring. The basic method of dividing an integer n by all primes pn is much too slow for most purposes. For many years, people have worked on developing more efficient algorithms. We present some of them here. In Chapter 21, we’ll also cover a method using elliptic curves, and in Chapter 25, we’ll show how a quantum computer, if built, could factor efficiently.

One method, which is also too slow, is usually called the Fermat factorization method. The idea is to express n as a difference of two squares: n=x2y2. Then n=(x+y)(xy) gives a factorization of n. For example, suppose we want to factor n=295927. Compute n+12, n+22, n+32, , until we find a square. In this case, 295927+32=295936=5442. Therefore,

295927=(544+3)(5443)=547541.

The Fermat method works well when n is the product of two primes that are very close together. If n=pq, it takes |pq|/2 steps to find the factorization. But if p and q are two randomly selected 300-digit primes, it is likely that |pq| will be very large, probably around 300 digits, too. So Fermat factorization is unlikely to work. Just to be safe, however, the primes for an RSA modulus are often chosen to be of slightly different sizes.

We now turn to more modern methods. If one of the prime factors of n has a special property, it is sometimes easier to factor n. For example, if p divides n and p1 has only small prime factors, the following method is effective. It was invented by Pollard in 1974.

The p1 Factoring Algorithm

Choose an integer a > 1. Often a=2 is used. Choose a bound B. Compute baB!(mod n) as follows. Let b1a(mod n) and bjbj1j(mod n). Then bBb(mod n). Let d=gcd(b1, n). If 1<d<n, we have found a nontrivial factor of n.

Suppose p is a prime factor of n such that p1 has only small prime factors. Then it is likely that p1 will divide B!, say B!=(p1)k. By Fermat’s theorem, baB!(ap1)k1(mod p), so p will occur in the greatest common divisor of b1 and n. If q is another prime factor of n, it is unlikely that b1(mod q), unless q1 also has only small prime factors. If d=n, not all is lost. In this case, we have an exponent r (namely B!) and an a such that ar1(mod n). There is a good chance that the ar1 method (explained later in this section) will factor n. Alternatively, we could choose a smaller value of B and repeat the calculation.

For an example, see Example 34 in the Computer Appendices.

How do we choose the bound B? If we choose a small B, then the algorithm will run quickly but will have a very small chance of success. If we choose a very large B, then the algorithm will be very slow. The actual value used will depend on the situation at hand.

In the applications, we will use integers that are products of two primes, say n=pq, but that are hard to factor. Therefore, we should ensure that p1 has at least one large prime factor. This is easy to accomplish. Suppose we want p to have around 300 digits. Choose a large prime p0, perhaps around 10140. Look at integers of the form kp0+1, with k running through some integers around 10160. Test kp0+1 for primality by the Miller-Rabin test, as before. On the average, this should produce a desired value of p in less than 400 steps. Now choose a large prime q0 and follow the same procedure to obtain q. Then n=pq will be hard to factor by the p1 method.

The elliptic curve factorization method (see Section 21.3) gives a generalization of the p1 method. However, it uses some random numbers near p1 and only requires at least one of them to have only small prime factors. This allows the method to detect many more primes p, not just those where p1 has only small prime factors.

9.4.1 x2y2

Since it is the basis of the best current factorization methods, we repeat the following result from Section 9.4.

Basic Principle

Let n be an integer and suppose there exist integers x and y with x2y2(mod n), but x  ±y(mod n). Then n is composite. Moreover, gcd(xy, n) gives a nontrivial factor of n.

For an example, see Example 33 in the Computer Appendices.

How do we find the numbers x and y? Let’s suppose we want to factor n=3837523. Observe the following:

93982    5519(mod 3837523)190952    225111319(mod 3837523)19642    32133(mod 3837523)170782    263211(mod 3837523).

If we multiply the relations, we obtain

(939819095196417078)2(2432531113219)22230387225867052.

Since 2230387  ±2586705(mod 3837523), we now can factor 3837523 by calculating

gcd(22303872586705, 3837523)=1093.

The other factor is 3837523/1093=3511.

Here is a way of looking at the calculations we just did. First, we generate squares such that when they are reduced mod n=3837523 they can be written as products of small primes (in the present case, primes less than 20). This set of primes is called our factor base. We’ll discuss how to generate such squares shortly. Each of these squares gives a row in a matrix, where the entries are the exponents of the primes 2, 3, 5, 7, 11, 13, 17, 19. For example, the relation 170782263211(mod 3837523) gives the row 6, 2, 0, 0, 1, 0, 0, 0.

In addition to the preceding relations, suppose that we have also found the following relations:

80772    219(mod 3837523)33972    255132(mod 3837523)142622    527213(mod 3837523).

We obtain the matrix

939819095196417078807733971426200500001201011010200030062001000100000015010020000220100.

Now look for linear dependencies mod 2 among the rows. Here are three of them:

  1. 1st + 5th + 6th = (6,0,6,0,0,2,0,2)0(mod 2)

  2. 1st + 2nd + 3rd + 4th = (8,4,6,0,2,4,0,2)0(mod 2)

  3. 3rd + 7th = (0,2,2,2,0,4,0,0)0(mod 2)

When we have such a dependency, the product of the numbers yields a square. For example, these three yield

  1. (939880773397)22656132192(23531319)2

  2. (939819095196417078)2(2332531113219)2

  3. (196414262)2(357132)2

Therefore, we have x2y2(mod n) for various values of x and y. If x  ±y(mod n), then gcd(xy, n) yields a nontrivial factor of n. If x±y(mod n), then usually gcd(xy, n)=1 or n, so we don’t obtain a factorization. In our three examples, we have

  1. 359052322470002, but 3590523247000(mod 3837523)

  2. 2230387225867052 and gcd(22303872586705, 3837523)=1093

  3. 11479072177452 and gcd(114790717745, 3837523)=1093

We now return to the basic question: How do we find the numbers 9398, 19095, etc.? The idea is to produce squares that are slightly larger than a multiple of n, so they are small mod n. This means that there is a good chance they are products of small primes. An easy way is to look at numbers of the form [in+j] for small j and for various values of i. Here [x] denotes the greatest integer less than or equal to x. The square of such a number is approximately in+2jin+j2, which is approximately 2jin+j2 mod n. As long as i is not too large, this number is fairly small, hence there is a good chance it is a product of small primes.

In the preceding calculation, we have 8077=[17n+1] and 9398=[23n+4], for example.

The method just used is the basis of many of the best current factorization methods. The main step is to produce congruence relations

x2product of small primes.

An improved version of the above method is called the quadratic sieve. A recent method, the number field sieve, uses more sophisticated techniques to produce such relations and is somewhat faster in many situations. See [Pomerance] for a description of these two methods and for a discussion of the history of factorization methods. See also Exercise 52.

Once we have several congruence relations, they are put into a matrix, as before. If we have more rows than columns in the matrix, we are guaranteed to have a linear dependence relation mod 2 among the rows. This leads to a congruence x2y2(mod n). Of course, as in the case of 1st + 5th + 6th 0(mod 2) considered previously, we might end up with x±y, in which case we don’t obtain a factorization. But this situation is expected to occur at most half the time. So if we have enough relations – for example, if there are several more rows than columns – then we should have a relation that yields x2y2 with x  ±y. In this case gcd(xy, n) is a nontrivial factor of n.

In the last half of the twentieth century, there was dramatic progress in factoring. This was partly due to the development of computers and partly due to improved algorithms. A major impetus was provided by the use of factoring in cryptology, especially the RSA algorithm. Table 9.2 gives the factorization records (in terms of the number of decimal digits) for various years.

Table 9.2 Factorization Records

9.4.2 Using ar1

On the surface, the Miller-Rabin test looks like it might factor n quite often; but what usually happens is that bk1 is reached without ever having bu±1(mod n). The problem is that usually an1  1(mod n). Suppose, on the other hand, that we have some exponent r, maybe not n1, such that ar1(mod n) for some a with gcd(a, n)=1. Then it is often possible to factor n.

ar1 Factorization Method

Suppose we have an exponent r > 0 and an integer a such that ar1(mod n). Write r=2km with m odd. Let b0am(mod n), and successively define bu+1bu2(mod n) for 0uk1. If b01(mod n), then stop; the procedure has failed to factor n. If, for some u, we have bu1(mod n), stop; the procedure has failed to factor n. If, for some u, we have bu+11(mod n) but bu  ±1(mod n), then gcd(bu1, n) gives a nontrivial factor of n.

Of course, if we take a=1, then any r works. But then b0=1, so the method fails. But if a and r are found by some reasonably sensible method, there is a good chance that this method will factor n.

This looks very similar to the Miller-Rabin test. The difference is that the existence of r guarantees that we have bu+11(mod n) for some u, which doesn’t happen as often in the Miller-Rabin situation. Trying a few values of a has a very high probability of factoring n.

Of course, we might ask how we can find an exponent r. Generally, this seems to be very difficult, and this test cannot be used in practice. However, it is useful in showing that knowing the decryption exponent in the RSA algorithm allows us to factor the modulus. Moreover, if a quantum computer is built, it will perform factorizations by finding such an exponent r via its unique quantum properties. See Chapter 25.

For an example for how this method is used in analyzing RSA, see Example 32 in the Computer Appendices.