9.2 Attacks on RSA – Introduction to Cryptography with Coding Theory, 3rd Edition

9.2 Attacks on RSA

In practice, the RSA algorithm has proven to be effective, as long as it is implemented correctly. We give a few possible implementation mistakes in the Exercises. Here are a few other potential difficulties. For more about attacks on RSA, see [Boneh].

Theorem

Let n=pq have m digits. If we know the first m/4, or the last m/4, digits of p, we can efficiently factor n.

In other words, if p and q have 300 digits, and we know the first 150 digits, or the last 150 digits, of p, then we can factor n. Therefore, if we choose a random starting point to choose our prime p, the method should be such that a large amount of p is not predictable. For example, suppose we take a random 150-digit number N and test numbers of the form N10150+k, k=1, 3, 5, , for primality until we find a prime p (which should happen for k<10000). An attacker who knows that this method is used will know 147 of the last 150 digits (they will all be 0 except for the last three or four digits). Trying the method of the theorem for the various values of k<10000 will eventually lead to the factorization of n.

For details of the preceding result, see [Coppersmith2]. A related result is the following.

Theorem

Suppose (n, e) is an RSA public key and n has m digits. Let d be the decryption exponent. If we have at least the last m/4 digits of d, we can efficiently find d in time that is linear in elog2e.

This means that the time to find d is bounded as a function linear in elog2e. If e is small, it is therefore quite fast to find d when we know a large part of d. If e is large, perhaps around n, the theorem is no better than a case-by-case search for d. For details, see [Boneh et al.].

9.2.1 Low Exponent Attacks

Low encryption or decryption exponents are tempting because they speed up encryption or decryption. However, there are certain dangers that must be avoided. One pitfall of using e=3 is given in Computer Problem 14. Another difficulty is discussed in Chapter 23 (Lattice Methods). These problems can be avoided by using a somewhat higher exponent. One popular choice is e=65537=216+1. This is prime, so it is likely that it is relatively prime to (p1)(q1). Since it is one more than a power of 2, exponentiation to this power can be done quickly: To calculate x65537, square x sixteen times, then multiply the result by x.

The decryption exponent d should of course be chosen large enough that brute force will not find it. However, even more care is needed, as the following result shows. One way to obtain desired properties of d is to choose d first, then find e with de1(mod ϕ(n)).

Suppose Bob wants to be able to decrypt messages quickly, so he chooses a small value of d. The following theorem of M. Wiener [Wiener] shows that often Eve can then find d easily. In practice, if the inequalities in the hypotheses of the proposition are weakened, then Eve can still use the method to obtain d in many cases. Therefore, it is recommended that d be chosen fairly large.

Theorem

Suppose p, q are primes with q<p<2q. Let n=pq and let 1d, e<ϕ(n) satisfy de1(mod (p1)(q1)). If d<13n1/4, then d can be calculated quickly (that is, in time polynomial in logn).

Proof. Since q2<pq=n, we have q<n. Therefore, since p<2q,

nϕ(n)=pq(p1)(q1)=p+q1<3q<3n.

Write ed=1+ϕ(n)k for some integer k1. Since e<ϕ(n), we have

ϕ(n)k<ed<13ϕ(n)n1/4, 

so k<13n1/4. Therefore,

kned=k(nϕ(n))1<k(nϕ(n))<13n1/4(3n)=n3/4.

Also, since k(nϕ(n))1 > 0, we have kned > 0. Dividing by dn yields

0<kden<1dn1/4<13d2, 

since 3d<n1/4 by assumption.

We now need a result about continued fractions. Recall from Section 3.12 that if x is a positive real number and k and d are positive integers with

kdx<12d2, 

then k/d arises from the continued fraction expansion of x. Therefore, in our case, k/d arises from the continued fraction expansion of e/n. Therefore, Eve does the following:

  1. Computes the continued fraction of e/n. After each step, she obtains a fraction A/B.

  2. Eve uses k=A and d=B to compute C=(ed1)/k. (Since ed=1+ϕ(n)k, this value of C is a candidate for ϕ(n).)

  3. If C is not an integer, she proceeds to the next step of the continued fraction.

  4. If C is an integer, then she finds the roots r1, r2 of X2(nC+1)X+n. (Note that this is possibly the equation X2(nϕ(n)+1)X+n=(Xp)(Xq) from earlier.) If r1 and r2 are integers, then Eve has factored n. If not, then Eve proceeds to the next step of the continued fraction algorithm.

Since the number of steps in the continued fraction expansion of e/n is at most a constant times logn, and since the continued fraction algorithm stops when the fraction e/n is reached, the algorithm terminates quickly. Therefore, Eve finds the factorization of n quickly.

Remarks

Recall that the rational approximations to a number x arising from the continued fraction algorithm are alternately larger than x and smaller than x. Since 0<kden, we only need to consider every second fraction arising from the continued fraction.

What happens if Eve reaches e/n without finding the factorization of n? This means that the hypotheses of the proposition are not satisfied. However, it is possible that sometimes the method will yield the factorization of n even when the hypotheses fail.

Example

Let n=1966981193543797 and e=323815174542919. The continued fraction of e/n is

[0;6, 13, 2, 3, 1, 3, 1, 9, 1, 36, 5, 2, 1, 6, 1, 43, 13, 1, 10, 11, 2, 1, 9, 5]=16+113+12+13+11+13+11+.

The first fraction is 1/6, so we try k=1, d=6. Since d must be odd, we discard this possibility.

By the remark, we may jump to the third fraction:

16+113+12=27164.

Again, we discard this since d must be odd.

The fifth fraction is 121/735. This gives C=(e7351)/121, which is not an integer.

The seventh fraction is 578/3511 This gives C=1966981103495136 as the candidate for ϕ(n). The roots of

X2(nC+1)X+n

are 37264873 and 52783789, to several decimal places of accuracy. Since

n=37264873×52783789, 

we have factored n.

9.2.2 Short Plaintext

A common use of RSA is to transmit keys for use in DES, AES, or other symmetric cryptosystems. However, a naive implementation could lead to a loss of security. Suppose a 56-bit DES key is written as a number m1017. This is encrypted with RSA to obtain cme(mod n). Although m is small, the ciphertext c is probably a number of the same size as n, so perhaps around 200 digits. However, Eve attacks the system as follows. She makes two lists:

  1. cxe(mod n) for all x with 1x109.

  2. ye(mod n) for all y with 1y109.

She looks for a match between an element on the first list and an element on the second list. If she finds one, then she has cxeye for some x, y. This yields

c(xy)e(mod n), 

so mxy(mod n). Is this attack likely to succeed? Suppose m is the product of two integers x and y, both less than 109. Then these x, y will yield a match for Eve. Not every m will have this property, but many values of m are the product of two integers less than 109. For these, Eve will obtain m.

This attack is much more efficient than trying all 1017 possibilities for m, which is nearly impossible on one computer, and would take a long time even with several computers working in parallel. In the present attack, Eve needs to compute and store a list of length 109, then compute the elements on the other list and check each one against the first list. Therefore, Eve performs approximately 2×109 computations (and compares with the list up to 109 times). This is easily possible on a single computer. For more on this attack, see [Boneh-Joux-Nguyen].

It is easy to prevent this attack. Instead of using a small value of m, adjoin some random digits to the beginning and end of m so as to form a much longer plaintext. When Bob decrypts the ciphertext, he simply removes these random digits and obtains m.

A more sophisticated method of preprocessing the plaintext, namely Optimal Asymmetric Encryption Padding (OAEP), was introduced by Bellare and Rogaway [Bellare-Rogaway2] in 1994. Suppose Alice wants to send a message m to Bob, whose RSA public key is (n, e), where n has k bits. Two positive integers k0 and k1 are specified in advance, with k0+k1<k. Alice’s message is allowed to have kk0k1 bits. Typical values are k=1024, k0=k1=128, kk0k1=768. Let G be a function that inputs strings of k0 bits and outputs strings of kk0 bits. Let H be a function that inputs kk0 bits and outputs k0 bits. The functions G and H are usually constructed from hash functions (see Chapter 11 for a discussion of hash functions). To encrypt m, Alice first expands it to length kk0 by adjoining k1 zero bits. The result is denoted m0k1. She then chooses a random string r of k0 bits and computes

x1=m0k1G(r), x2=rH(x1).

If the concatenation x1||x2 is a binary number larger than n, Alice chooses a new random number r and computes new values for x1 and x2. As soon as she obtains x1||x2<n (this has a probability of at least 1/2 of happening for each r, as long as G(r) produces fairly random outputs), she forms the ciphertext

E(m)=(x1||x2)e(mod n).

To decrypt a ciphertext c, Bob uses his private RSA decryption exponent d to compute cd(mod n). The result is written in the form

cd(mod n)=y1||y2, 

where y1 has kk0 bits and y2 has k0 bits. Bob then computes

m0k1=y1G(H(y1)y2).

The correctness of this decryption can be justified as follows. If the ciphertext is the encryption of m, then

y1=x1=m0k1G(r)andy2=x2=rH(x1).

Therefore,

H(y1)y2=H(x1)rH(x1)=r

and

y1G(H(y1)y2)=x1G(r)=m0k1.

Bob removes the k1 zero bits from the end of m0k1 and obtains m. Also, Bob has check on the integrity of the ciphertext. If there are not k1 zeros at the end, then the ciphertext does not correspond to a valid encryption.

This method is sometimes called plaintext-aware encryption. Note that the padding with x2 depends on the message m and on the random parameter r. This makes chosen ciphertext attacks on the system more difficult. It also is used for ciphertext indistinguishability. See Section 4.5.

For discussion of the security of OAEP, see [Shoup].

9.2.3 Timing Attacks

Another type of attack on RSA and similar systems was discovered by Paul Kocher in 1995, while he was an undergraduate at Stanford. He showed that it is possible to discover the decryption exponent by carefully timing the computation times for a series of decryptions. Though there are ways to thwart the attack, this development was unsettling. There had been a general feeling of security since the mathematics was well understood. Kocher’s attack demonstrated that a system could still have unexpected weaknesses.

Here is how the timing attack works. Suppose Eve is able to observe Bob decrypt several ciphertexts y. She times how long this takes for each y. Knowing each y and the time required for it to be decrypted will allow her to find the decryption exponent d. But first, how could Eve obtain such information? There are several situations where encrypted messages are sent to Bob and his computer automatically decrypts and responds. Measuring the response times suffices for the present purposes.

We need to assume that we know the hardware being used to calculate yd(mod n). We can use this information to calculate the computation times for various steps that potentially occur in the process.

Let’s assume that yd(mod n) is computed by an algorithm given in Exercise 56 in Chapter 3, which is as follows:

Let d=b1b2bw be written in binary (for example, when x=1011, we have b1=1, b2=0, b3=1, b4=1). Let y and n be integers. Perform the following procedure:

  1. Start with k=1 and s1=1.

  2. If bk=1, let rksky(mod n). If bk=0, let rk=sk.

  3. Let sk+1rk2(mod n).

  4. If k=w, stop. If k<w, add 1 to k and go to (2).

Then rwyd(mod n).

Note that the multiplication sky occurs only when the bit bk=1. In many situations, there is a reasonably large variation in how long this multiplication takes. We assume this is the case here.

Before we continue, we need a few facts from probability. Suppose we have a random process that produces real numbers t as outputs. For us, t will be the time it takes for the computer to complete a calculation, given a random input y. The mean is the average value of these outputs. If we record outputs t1, , tn, the mean should be approximately m=(t1+tn)/n. The variance for the random process is approximated by

Var({ti})=(t1m)2++(tnm)2n.

The standard deviation is the square root of the variance and gives a measure of how much variation there is in the values of the ti’s.

The important fact we need is that when two random processes are independent, the variance for the sum of their outputs is the sum of the variances of the two processes. For example, we will break the computation done by the computer into two independent processes, which will take times t and t′′. The total time t will be t+t′′. Therefore, Var({ti}) should be approximately Var({ti})+Var({ti}).

Now assume Eve knows ciphertexts y1, , yn and the times that it took to compute each yid(mod n). Suppose she knows bits b1, , bk1 of the exponent d. Since she knows the hardware being used, she knows how much time was used in calculating r1, , rk1 in the preceding algorithm. Therefore, she knows, for each yi, the time ti that it takes to compute rk, , rw.

Eve wants to determine bk. If bk=1, a multiplication sky(mod n) will take place for each ciphertext yi that is processed. If bk=0, there is no such multiplication.

Let ti be the amount of time it takes the computer to perform the multiplication sky(mod n), though Eve does not yet know whether this multiplication actually occurs. Let ti=titi. Eve computes Var({ti}) and Var({ti}). If Var({ti}) > Var({ti}), then Eve concludes that bk=1. If not, bk=0. After determining bk, she proceeds in the same manner to find all the bits.

Why does this work? If the multiplication occurs, ti is the amount of time it takes the computer to complete the calculation after the multiplication. It is reasonable to assume ti and ti are outputs that are independent of each other. Therefore,

Var({ti})Var({ti})+Var({ti}) > Var({ti}).

If the multiplication does not occur, ti is the amount of time for an operation unrelated to the computation, so it is reasonable to assume ti and ti are independent. Therefore,

Var({ti})Var({ti})+Var({ti}) > Var({ti}).

Note that we couldn’t use the mean in place of the variance, since the mean of {ti} would be negative, so the last inequality would not hold. All that can be deduced from the mean is the total number of nonzero bits in the binary expansion of d.

The preceding gives a fairly simple version of the method. In practice, various modifications would be needed, depending on the specific situation. But the general strategy remains the same. For more details, see [Kocher]. For more on timing attacks, see [Crosby et al.].

A similar attack on RSA works by measuring the power consumed during the computations. See [Kocher et al.]. Another method, called acoustic cryptanalysis, obtains information from the high-pitched noises emitted by the electronic components of a computer during its computations. See [Genkin et al.]. Attacks such as these and the timing attack can be prevented by appropriate design features in the physical implementation.

Timing attacks, power analysis, and acoustic cryptanalysis are examples of side-channel attacks, where the attack is on the implementation rather than on the basic cryptographic algorithm.