# 9.4 Factoring

We now turn to factoring. The basic method of dividing an integer $n$ by all primes $p\le \sqrt{n}$ is much too slow for most purposes. For many years, people have worked on developing more efficient algorithms. We present some of them here. In Chapter 21, we’ll also cover a method using elliptic curves, and in Chapter 25, we’ll show how a quantum computer, if built, could factor efficiently.

One method, which is also too slow, is usually called the **Fermat factorization** method. The idea is to express $n$ as a difference of two squares: $n={x}^{2}-{y}^{2}$. Then $n=(x+y)(x-y)$ gives a factorization of $n$. For example, suppose we want to factor $n=295927$. Compute $n+{1}^{2}$, $n+{2}^{2}$, $n+{3}^{2}$, $\dots $, until we find a square. In this case, $295927+{3}^{2}=295936={544}^{2}$. Therefore,

The Fermat method works well when $n$ is the product of two primes that are very close together. If $n=pq$, it takes $|p-q|/2$ steps to find the factorization. But if $p$ and $q$ are two randomly selected 300-digit primes, it is likely that $|p-q|$ will be very large, probably around 300 digits, too. So Fermat factorization is unlikely to work. Just to be safe, however, the primes for an RSA modulus are often chosen to be of slightly different sizes.

We now turn to more modern methods. If one of the prime factors of $n$ has a special property, it is sometimes easier to factor $n$. For example, if $p$ divides $n$ and $p-1$ has only small prime factors, the following method is effective. It was invented by Pollard in 1974.

# The $p-1$ Factoring Algorithm

Choose an integer $a\text{}\text{}1$. Often $a=2$ is used. Choose a bound $B$. Compute $b\equiv {a}^{B!}\text{\hspace{0.17em}}(\text{mod}\text{}n)$ as follows. Let ${b}_{1}\equiv a\text{\hspace{0.17em}}(\text{mod}\text{}n)$ and ${b}_{j}\equiv {b}_{j-1}^{j}\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Then ${b}_{B}\equiv b\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Let $d=gcd(b-1\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}n)$. If $1<d<n$, we have found a nontrivial factor of $n$.

Suppose $p$ is a prime factor of $n$ such that $p-1$ has only small prime factors. Then it is likely that $p-1$ will divide $B!$, say $B!=(p-1)k$. By Fermat’s theorem, $b\equiv {a}^{B!}\equiv ({a}^{p-1}{)}^{k}\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}p)$, so $p$ will occur in the greatest common divisor of $b-1$ and $n$. If $q$ is another prime factor of $n$, it is unlikely that $b\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}q)$, unless $q-1$ also has only small prime factors. If $d=n$, not all is lost. In this case, we have an exponent $r$ (namely $B!$) and an $a$ such that ${a}^{r}\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$. There is a good chance that the ${a}^{r}\equiv 1$ method (explained later in this section) will factor $n$. Alternatively, we could choose a smaller value of $B$ and repeat the calculation.

For an example, see Example 34 in the Computer Appendices.

How do we choose the bound $B$? If we choose a small $B$, then the algorithm will run quickly but will have a very small chance of success. If we choose a very large $B$, then the algorithm will be very slow. The actual value used will depend on the situation at hand.

In the applications, we will use integers that are products of two primes, say $n=pq$, but that are hard to factor. Therefore, we should ensure that $p-1$ has at least one large prime factor. This is easy to accomplish. Suppose we want $p$ to have around 300 digits. Choose a large prime ${p}_{0}$, perhaps around ${10}^{140}$. Look at integers of the form $k{p}_{0}+1$, with $k$ running through some integers around ${10}^{160}$. Test $k{p}_{0}+1$ for primality by the Miller-Rabin test, as before. On the average, this should produce a desired value of $p$ in less than 400 steps. Now choose a large prime ${q}_{0}$ and follow the same procedure to obtain $q$. Then $n=pq$ will be hard to factor by the $p-1$ method.

The elliptic curve factorization method (see Section 21.3) gives a generalization of the $p-1$ method. However, it uses some random numbers near $p-1$ and only requires at least one of them to have only small prime factors. This allows the method to detect many more primes $p$, not just those where $p-1$ has only small prime factors.

# 9.4.1 ${\mathit{x}}^{\mathbf{2}}\mathbf{\equiv}{\mathit{y}}^{\mathbf{2}}$

Since it is the basis of the best current factorization methods, we repeat the following result from Section 9.4.

# Basic Principle

Let $n$ be an integer and suppose there exist integers $x$ and $y$ with ${x}^{2}\equiv {y}^{2}\text{\hspace{0.17em}}(\text{mod}\text{}n)$, but $x\text{}\overline{)\equiv}\text{}\pm y\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Then $n$ is composite. Moreover, $gcd(x-y\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}n)$ gives a nontrivial factor of $n$.

For an example, see Example 33 in the Computer Appendices.

How do we find the numbers $x$ and $y$? Let’s suppose we want to factor $n=3837523$. Observe the following:

If we multiply the relations, we obtain

Since $2230387\text{}\overline{)\equiv}\text{}\pm 2586705\text{\hspace{0.17em}}(\text{mod}\text{}3837523)$, we now can factor 3837523 by calculating

The other factor is $3837523/1093=3511$.

Here is a way of looking at the calculations we just did. First, we generate squares such that when they are reduced mod $n=3837523$ they can be written as products of small primes (in the present case, primes less than 20). This set of primes is called our ** factor base**. We’ll discuss how to generate such squares shortly. Each of these squares gives a row in a matrix, where the entries are the exponents of the primes 2, 3, 5, 7, 11, 13, 17, 19. For example, the relation ${17078}^{2}\equiv {2}^{6}\cdot {3}^{2}\cdot 11\text{\hspace{0.17em}}(\text{mod}\text{}3837523)$ gives the row 6, 2, 0, 0, 1, 0, 0, 0.

In addition to the preceding relations, suppose that we have also found the following relations:

We obtain the matrix

Now look for linear dependencies mod 2 among the rows. Here are three of them:

$\text{1st+5th+6th=(6,0,6,0,0,2,0,2)}\equiv 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{(mod2)}$

$\text{1st+2nd+3rd+4th=(8,4,6,0,2,4,0,2)}\equiv 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{(mod2)}$

$\text{3rd+7th=(0,2,2,2,0,4,0,0)}\equiv 0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{(mod2)}$

When we have such a dependency, the product of the numbers yields a square. For example, these three yield

$(9398\cdot 8077\cdot 3397{)}^{2}\equiv {2}^{6}\cdot {5}^{6}\cdot {13}^{2}\cdot {19}^{2}\equiv ({2}^{3}\cdot {5}^{3}\cdot 13\cdot 19{)}^{2}$

$(9398\cdot 19095\cdot 1964\cdot 17078{)}^{2}\equiv ({2}^{3}\cdot {3}^{2}\cdot {5}^{3}\cdot 11\cdot {13}^{2}\cdot 19{)}^{2}$

$(1964\cdot 14262{)}^{2}\equiv (3\cdot 5\cdot 7\cdot {13}^{2}{)}^{2}$

Therefore, we have ${x}^{2}\equiv {y}^{2}\text{\hspace{0.17em}}(\text{mod}\text{}n)$ for various values of $x$ and $y$. If $x\text{}\overline{)\equiv}\text{}\pm y\text{\hspace{0.17em}}(\text{mod}\text{}n)$, then $gcd(x-y\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}n)$ yields a nontrivial factor of $n$. If $x\equiv \pm y\text{\hspace{0.17em}}(\text{mod}\text{}n)$, then usually $gcd(x-y\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}n)=1$ or $n$, so we don’t obtain a factorization. In our three examples, we have

${3590523}^{2}\equiv {247000}^{2}$, but $3590523\equiv -247000\text{\hspace{0.17em}}(\text{mod}\text{}3837523)$

${2230387}^{2}\equiv {2586705}^{2}$ and $gcd(2230387-2586705\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}3837523)=1093$

${1147907}^{2}\equiv {17745}^{2}$ and $gcd(1147907-17745\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}3837523)=1093$

We now return to the basic question: How do we find the numbers 9398, 19095, etc.? The idea is to produce squares that are slightly larger than a multiple of $n$, so they are small mod $n$. This means that there is a good chance they are products of small primes. An easy way is to look at numbers of the form $[\sqrt{in}+j]$ for small $j$ and for various values of $i$. Here $[x]$ denotes the greatest integer less than or equal to $x$. The square of such a number is approximately $in+2j\sqrt{in}+{j}^{2}$, which is approximately $2j\sqrt{in}+{j}^{2}$ mod $n$. As long as $i$ is not too large, this number is fairly small, hence there is a good chance it is a product of small primes.

In the preceding calculation, we have $8077=[\sqrt{17n}+1]$ and $9398=[\sqrt{23n}+4]$, for example.

The method just used is the basis of many of the best current factorization methods. The main step is to produce congruence relations

An improved version of the above method is called the quadratic sieve. A recent method, the number field sieve, uses more sophisticated techniques to produce such relations and is somewhat faster in many situations. See [Pomerance] for a description of these two methods and for a discussion of the history of factorization methods. See also Exercise 52.

Once we have several congruence relations, they are put into a matrix, as before. If we have more rows than columns in the matrix, we are guaranteed to have a linear dependence relation mod 2 among the rows. This leads to a congruence ${x}^{2}\equiv {y}^{2}\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Of course, as in the case of 1st + 5th + 6th $\equiv 0\text{\hspace{0.17em}}(\text{mod}\text{}2)$ considered previously, we might end up with $x\equiv \pm y$, in which case we don’t obtain a factorization. But this situation is expected to occur at most half the time. So if we have enough relations – for example, if there are several more rows than columns – then we should have a relation that yields ${x}^{2}\equiv {y}^{2}$ with $x\text{}\overline{)\equiv}\text{}\pm y$. In this case $gcd(x-y\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}n)$ is a nontrivial factor of $n$.

In the last half of the twentieth century, there was dramatic progress in factoring. This was partly due to the development of computers and partly due to improved algorithms. A major impetus was provided by the use of factoring in cryptology, especially the RSA algorithm. Table 9.2 gives the factorization records (in terms of the number of decimal digits) for various years.

# 9.4.2 Using ${\mathit{a}}^{\mathit{r}}\mathbf{\equiv}\mathbf{1}$

On the surface, the Miller-Rabin test looks like it might factor $n$ quite often; but what usually happens is that ${b}_{k-1}$ is reached without ever having ${b}_{u}\equiv \pm 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$. The problem is that usually ${a}^{n-1}\text{}\overline{)\equiv}\text{}1\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Suppose, on the other hand, that we have some exponent $r$, maybe not $n-1$, such that ${a}^{r}\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$ for some $a$ with $gcd(a\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}n)=1$. Then it is often possible to factor $n$.

# ${\mathit{a}}^{\mathit{r}}\mathbf{\equiv}\mathbf{1}$ Factorization Method

Suppose we have an exponent $r\text{}\text{}0$ and an integer $a$ such that ${a}^{r}\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Write $r={2}^{k}m$ with $m$ odd. Let ${b}_{0}\equiv {a}^{m}\text{\hspace{0.17em}}(\text{mod}\text{}n)$, and successively define ${b}_{u+1}\equiv {b}_{u}^{2}\text{\hspace{0.17em}}(\text{mod}\text{}n)$ for $0\le u\le k-1$. If ${b}_{0}\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$, then stop; the procedure has failed to factor $n$. If, for some $u$, we have ${b}_{u}\equiv -1\text{\hspace{0.17em}}(\text{mod}\text{}n)$, stop; the procedure has failed to factor $n$. If, for some $u$, we have ${b}_{u+1}\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$ but ${b}_{u}\text{}\overline{)\equiv}\text{}\pm 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$, then $gcd({b}_{u}-1\text{,}\text{}\phantom{\rule{thinmathspace}{0ex}}n)$ gives a nontrivial factor of $n$.

Of course, if we take $a=1$, then any $r$ works. But then ${b}_{0}=1$, so the method fails. But if $a$ and $r$ are found by some reasonably sensible method, there is a good chance that this method will factor $n$.

This looks very similar to the Miller-Rabin test. The difference is that the existence of $r$ guarantees that we have ${b}_{u+1}\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}n)$ for some $u$, which doesn’t happen as often in the Miller-Rabin situation. Trying a few values of $a$ has a very high probability of factoring $n$.

Of course, we might ask how we can find an exponent $r$. Generally, this seems to be very difficult, and this test cannot be used in practice. However, it is useful in showing that knowing the decryption exponent in the RSA algorithm allows us to factor the modulus. Moreover, if a quantum computer is built, it will perform factorizations by finding such an exponent $r$ via its unique quantum properties. See Chapter 25.

For an example for how this method is used in analyzing RSA, see Example 32 in the Computer Appendices.