Chapter 9 The RSA Algorithm – Introduction to Cryptography with Coding Theory, 3rd Edition

Chapter 9 The RSA Algorithm

9.1 The RSA Algorithm

Alice wants to send a message to Bob, but they have not had previous contact and they do not want to take the time to send a courier with a key. Therefore, all information that Alice sends to Bob will potentially be obtained by the evil observer Eve. However, it is still possible for a message to be sent in such a way that Bob can read it but Eve cannot.

With all the previously discussed methods, this would be impossible. Alice would have to send a key, which Eve would intercept. She could then decrypt all subsequent messages. The possibility of the present scheme, called a public key cryptosystem, was first publicly suggested by Diffie and Hellman in their classic 1976 paper [Diffie-Hellman]. However, they did not yet have a practical implementation (although they did present an alternative key exchange procedure that works over public channels; see Section 10.4). In the next few years, several methods were proposed. The most successful, based on the idea that factorization of integers into their prime factors is hard, was proposed by Rivest, Shamir, and Adleman in 1977 and is known as the RSA algorithm.

It had long been claimed that government cryptographic agencies had discovered the RSA algorithm several years earlier, but secrecy rules prevented them from releasing any evidence. Finally, in 1997, documents released by CESG, a British cryptographic agency, showed that in 1970, James Ellis had discovered public key cryptography, and in 1973, Clifford Cocks had written an internal document describing a version of the RSA algorithm in which the encryption exponent e (see the discussion that follows) was the same as the modulus n.

Here is how the RSA algorithm works. Bob chooses two distinct large primes p and q and multiplies them together to form

n=pq.

He also chooses an encryption exponent e such that

gcd(e, (p1)(q1))=1.

He sends the pair (n, e) to Alice but keeps the values of p and q secret. In particular, Alice, who could possibly be an enemy of Bob, never needs to know p and q to send her message to Bob securely. Alice writes her message as a number m. If m is larger than n, she breaks the message into blocks, each of which is less than n. However, for simplicity, let’s assume for the moment that m < n. Alice computes

cme(mod n)

and sends c to Bob. Since Bob knows p and q, he can compute (p1)(q1) and therefore can find the decryption exponent d with

de1(mod (p1)(q1)).

As we’ll see later,

mcd(mod n), 

so Bob can read the message.

We summarize the algorithm in the following table.

Example

Bob chooses

p=885320963, q=238855417.

Then

n=pq=211463707796206571.

Let the encryption exponent be

e=9007.

The values of n and e are sent to Alice.

Alice’s message is cat. We will depart from our earlier practice of numbering the letters starting with a=0; instead, we start the numbering at a=01 and continue through z=26. We do this because, in the previous method, if the letter a appeared at the beginning of a message, it would yield a message number m starting with 00, so the a would disappear.

The message is therefore

m=30120.

Alice computes

cme301209007113535859035722866(modn).

She sends c to Bob.

Since Bob knows p and q, he knows (p1)(q1). He uses the extended Euclidean algorithm (see Section 3.2) to compute d such that

de1(mod (p1)(q1)).

The answer is

d=116402471153538991.

Bob computes

cd11353585903572286611640247115353899130120(mod n), 

so he obtains the original message.

For more examples, see Examples 24–30 in the Computer Appendices.

There are several aspects that need to be explained, but perhaps the most important is why mcd(mod n). Recall Euler’s theorem (Section 3.6): If gcd(a, n)=1, then aϕ(n)1(mod n). In our case, ϕ(n)=ϕ(pq)=(p1)(q1). Suppose gcd(m, n)=1. This is very likely the case; since p and q are large, m probably has neither as a factor. Since de1(mod ϕ(n)), we can write de=1+kϕ(n) for some integer k. Therefore,

cd(me)dm1+kϕ(n)m(mϕ(n))km1km(mod n).

We have shown that Bob can recover the message. If gcd(m, n)1, Bob still recovers the message. See Exercise 37.

What does Eve do? She intercepts n, e, c. She does not know p, q, d. We assume that Eve has no way of factoring n. The obvious way of computing d requires knowing ϕ(n). We show later that this is equivalent to knowing p and q. Is there another way? We will show that if Eve can find d, then she can probably factor n. Therefore, it is unlikely that Eve finds the decryption exponent d.

Since Eve knows cme(mod n), why doesn’t she simply take the eth root of c? This works well if we are not working mod n but is very difficult in our case. For example, if you know that m33(mod 85), you cannot calculate the cube root of 3, namely 1.4422 , on your calculator and then reduce mod 85. Of course, a case-by-case search would eventually yield m=7, but this method is not feasible for large n.

How does Bob choose p and q? They should be chosen at random, independently of each other. How large depends on the level of security needed, but it seems that they should have at least 300 digits. For reasons that we discuss later, it is perhaps best if they are of slightly different lengths. When we discuss primality testing, we’ll see that finding such primes can be done fairly quickly (see also Section 3.6). A few other tests should be done on p and q to make sure they are not bad. For example, if p1 has only small prime factors, then n is easy to factor by the p1 method (see Section 9.3), so p should be rejected and replaced with another prime.

Why does Bob require gcd(e, (p1)(q1))=1? Recall (see Section 3.3) that de1(mod (p1)(q1)) has a solution d if and only if gcd(e, (p1)(q1))=1. Therefore, this condition is needed in order for d to exist. The extended Euclidean algorithm can be used to compute d quickly. Since p1 is even, e=2 cannot be used; one might be tempted to use e=3. However, there are dangers in using small values of e (see Section 9.2, Computer Problem 14, and Section 23.3), so something larger is usually recommended. Also, if e is a moderately large prime, then there is no difficulty ensuring that gcd(e, (p1)(q1))=1. It is now generally recommended that e65537. Among the data collected for [Lenstra2012 et al.] is the distribution of RSA encryption exponents that is given in Table 9.1.

Table 9.1 RSA Encryption Exponents

In the encryption process, Alice calculates me(mod n). Recall that this can be done fairly quickly and without large memory, for example, by successive squaring. See Section 3.5. This is definitely an advantage of modular arithmetic: If Alice tried to calculate me first, then reduce mod n, it is possible that recording me would overflow her computer’s memory. Similarly, the decryption process of calculating cd(mod n) can be done efficiently. Therefore, all the operations needed for encryption and decryption can be done quickly (i.e., in time a power of logn). The security is provided by the assumption that n cannot be factored.

We made two claims. We justify them here. Recall that the point of these two claims was that finding ϕ(n) or finding the decryption exponent d is essentially as hard as factoring n. Therefore, if factoring is hard, then there should be no fast, clever way of finding d.

Claim 1: Suppose n=pq is the product of two distinct primes. If we know n and ϕ(n), then we can quickly find p and q.

Note that

nϕ(n)+1=pq(p1)(q1)+1=p+q.

Therefore, we know pq and p+q. The roots of the polynomial

X2(nϕ(n)+1)X+n=X2(p+q)X+pq=(Xp)(Xq)

are p and q, but they can also be calculated by the quadratic formula:

p, q=(nϕ(n)+1)±(nϕ(n)+1)24n2.

This yields p and q.

For example, suppose n=221 and we know that ϕ(n)=192. Consider the quadratic equation

X230X+221.

The roots are

p, q=30±30242212=13, 17.

For another example, see Example 31 in the Computer Appendices.

Claim 2: If we know d and e, then we can probably factor n.

In the discussion of factorization methods in Section 9.4, we show that if we have an exponent r > 0 such that ar1(mod n) for several a with gcd(a, n)=1, then we can probably factor n. Since de1 is a multiple of ϕ(n), say de1=kϕ(n), we have

ade1(aϕ(n))k1(mod n)

whenever gcd(a, n)=1. The ar1 factorization method can now be applied.

For an example, see Example 32 in the Computer Appendices.

Claim 2’: If e is small or medium-sized (for example, if e has several fewer digits than n) and we know d, then we can factor n.

We use the following procedure:

  1. Compute k=(de1)/n (that is, round up to the nearest integer).

  2. Compute

    ϕ(n)=de1k.
  3. Solve the quadratic equation

    X2(n+1ϕ(n))X+n=0.

    The solutions are p and q.

Example

Let n=670726081 and e=257. We discover that d=524523509. Compute

de1n=200.98.

Therefore, k=201. Then

ϕ(n)=de1201=670659412.

The roots of the equation

X2(n+1ϕ(n))X+n=X266670X+670726081=0

are 12347.00 and 54323.00, and we can check that n=12347×54323.

Why does this work? We know that de1(mod (p1)(q1)), so we write de=1+(p1)(q1)k. Since d<(p1)(q1), we know that

(p1)(q1)k<de<(p1)(q1)e, 

so k<e. We have

k=de1(p1)(q1) > de1n=(p1)(q1)kn=(pqpq+1)kn=k(p+q1)kn.

Usually, both p and q are approximately n. In practice, e and therefore k (which is less than e) are much smaller than n. Therefore, (p+q1)k is much smaller than n, which means that (p+q1)k/n<1 and it rounds off to 0.

Once we have k, we use de1=ϕ(n)k to solve for ϕ(n). As we have already seen, once we know n and ϕ(n), we can find p and q by solving the quadratic equation.

One way the RSA algorithm can be used is when there are several banks, for example, that want to be able to send financial data to each other. If there are several thousand banks, then it is impractical for each pair of banks to have a key for secret communication. A better way is the following. Each bank chooses integers n and e as before. These are then published in a public book. Suppose bank A wants to send data to bank B. Then A looks up B’s n and e and uses them to send the message. In practice, the RSA algorithm is not quite fast enough for sending massive amounts of data. Therefore, the RSA algorithm is often used to send a key for a faster encryption method such as AES.

PGP (= Pretty Good Privacy) used to be a standard method for encrypting email. When Alice sends an email message to Bob, she first signs the message using a digital signature algorithm such as those discussed in Chapter 13. She then encrypts the message using a block cipher such as triple DES or AES (other choices are IDEA or CAST-128) with a randomly chosen 128-bit key (a new random key is chosen for each transmission). She then encrypts this key using Bob’s public RSA key (other public key methods can also be used). When Bob receives the email, he uses his private RSA exponent to decrypt the random key. Then he uses this random key to decrypt the message, and he checks the signature to verify that the message is from Alice. For more discussion of PGP, see Section 15.6.