# 9.2 Attacks on RSA

In practice, the RSA algorithm has proven to be effective, as long as it is implemented correctly. We give a few possible implementation mistakes in the Exercises. Here are a few other potential difficulties. For more about attacks on RSA, see [Boneh].

# Theorem

Let $n=pq$ have $m$ digits. If we know the first $m/4$, or the last $m/4$, digits of $p$, we can efficiently factor $n$.

In other words, if $p$ and $q$ have 300 digits, and we know the first 150 digits, or the last 150 digits, of $p$, then we can factor $n$. Therefore, if we choose a random starting point to choose our prime $p$, the method should be such that a large amount of $p$ is not predictable. For example, suppose we take a random 150-digit number $N$ and test numbers of the form $N\cdot {10}^{150}+k$, $k=1\text{,}\text{}3\text{,}\text{}5\text{,}\text{}\dots $, for primality until we find a prime $p$ (which should happen for $k<10000$). An attacker who knows that this method is used will know 147 of the last 150 digits (they will all be 0 except for the last three or four digits). Trying the method of the theorem for the various values of $k<10000$ will eventually lead to the factorization of $n$.

For details of the preceding result, see [Coppersmith2]. A related result is the following.

# Theorem

Suppose $(n\text{,}\text{}e)$ is an RSA public key and $n$ has $m$ digits. Let $d$ be the decryption exponent. If we have at least the last $m/4$ digits of $d$, we can efficiently find $d$ in time that is linear in $e{log}_{2}e$.

This means that the time to find $d$ is bounded as a function linear in $e{log}_{2}e$. If $e$ is small, it is therefore quite fast to find $d$ when we know a large part of $d$. If $e$ is large, perhaps around $n$, the theorem is no better than a case-by-case search for $d$. For details, see [Boneh et al.].

# 9.2.1 Low Exponent Attacks

Low encryption or decryption exponents are tempting because they speed up encryption or decryption. However, there are certain dangers that must be avoided. One pitfall of using $e=3$ is given in Computer Problem 14. Another difficulty is discussed in Chapter 23 (Lattice Methods). These problems can be avoided by using a somewhat higher exponent. One popular choice is $e=65537={2}^{16}+1$. This is prime, so it is likely that it is relatively prime to $(p-1)(q-1)$. Since it is one more than a power of 2, exponentiation to this power can be done quickly: To calculate ${x}^{65537}$, square $x$ sixteen times, then multiply the result by $x$.

The decryption exponent $d$ should of course be chosen large enough that brute force will not find it. However, even more care is needed, as the following result shows. One way to obtain desired properties of $d$ is to choose $d$ first, then find $e$ with $de\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}\varphi (n))$.

Suppose Bob wants to be able to decrypt messages quickly, so he chooses a small value of $d$. The following theorem of M. Wiener [Wiener] shows that often Eve can then find $d$ easily. In practice, if the inequalities in the hypotheses of the proposition are weakened, then Eve can still use the method to obtain $d$ in many cases. Therefore, it is recommended that $d$ be chosen fairly large.

# Theorem

Suppose $p\text{,}\text{}q$ are primes with $q<p<2q$. Let $n=pq$ and let $1\le d\text{,}\text{}e\varphi (n)$ satisfy $de\equiv 1\text{\hspace{0.17em}}(\text{mod}\text{}(p-1)(q-1))$. If $d<{\displaystyle \frac{1}{3}}{n}^{1/4}$, then $d$ can be calculated quickly (that is, in time polynomial in $logn$).

Proof. Since ${q}^{2}<pq=n$, we have $q<\sqrt{n}$. Therefore, since $p<2q$,

Write $ed=1+\varphi (n)k$ for some integer $k\ge 1$. Since $e<\varphi (n)$, we have

so $k<{\displaystyle \frac{1}{3}}{n}^{1/4}$. Therefore,

Also, since $k(n-\varphi (n))-1\text{}\text{}0$, we have $kn-ed\text{}\text{}0$. Dividing by $dn$ yields

since $3d<{n}^{1/4}$ by assumption.

We now need a result about continued fractions. Recall from Section 3.12 that if $x$ is a positive real number and $k$ and $d$ are positive integers with

then $k/d$ arises from the continued fraction expansion of $x$. Therefore, in our case, $k/d$ arises from the continued fraction expansion of $e/n$. Therefore, Eve does the following:

Computes the continued fraction of $e/n$. After each step, she obtains a fraction $A/B$.

Eve uses $k=A$ and $d=B$ to compute $C=(ed-1)/k$. (Since $ed=1+\varphi (n)k$, this value of $C$ is a candidate for $\varphi (n)$.)

If $C$ is not an integer, she proceeds to the next step of the continued fraction.

If $C$ is an integer, then she finds the roots ${r}_{1}\text{,}\text{}{r}_{2}$ of ${X}^{2}-(n-C+1)X+n$. (Note that this is possibly the equation ${X}^{2}-(n-\varphi (n)+1)X+n=(X-p)(X-q)$ from earlier.) If ${r}_{1}$ and ${r}_{2}$ are integers, then Eve has factored $n$. If not, then Eve proceeds to the next step of the continued fraction algorithm.

Since the number of steps in the continued fraction expansion of $e/n$ is at most a constant times $logn$, and since the continued fraction algorithm stops when the fraction $e/n$ is reached, the algorithm terminates quickly. Therefore, Eve finds the factorization of $n$ quickly.

# Remarks

Recall that the rational approximations to a number $x$ arising from the continued fraction algorithm are alternately larger than $x$ and smaller than $x$. Since $0<{\displaystyle \frac{k}{d}}-{\displaystyle \frac{e}{n}}$, we only need to consider every second fraction arising from the continued fraction.

What happens if Eve reaches $e/n$ without finding the factorization of $n$? This means that the hypotheses of the proposition are not satisfied. However, it is possible that sometimes the method will yield the factorization of $n$ even when the hypotheses fail.

# Example

Let $n=1966981193543797$ and $e=323815174542919$. The continued fraction of $e/n$ is

The first fraction is $1/6$, so we try $k=1\text{,}\text{}d=6$. Since $d$ must be odd, we discard this possibility.

By the remark, we may jump to the third fraction:

Again, we discard this since $d$ must be odd.

The fifth fraction is $121/735$. This gives $C=(e\cdot 735-1)/121$, which is not an integer.

The seventh fraction is $578/3511$ This gives $C=1966981103495136$ as the candidate for $\varphi (n)$. The roots of

are $37264873$ and $52783789$, to several decimal places of accuracy. Since

we have factored $n$.

# 9.2.2 Short Plaintext

A common use of RSA is to transmit keys for use in DES, AES, or other symmetric cryptosystems. However, a naive implementation could lead to a loss of security. Suppose a 56-bit DES key is written as a number $m\approx {10}^{17}$. This is encrypted with RSA to obtain $c\equiv {m}^{e}\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Although $m$ is small, the ciphertext $c$ is probably a number of the same size as $n$, so perhaps around 200 digits. However, Eve attacks the system as follows. She makes two lists:

$c{x}^{-e}\text{\hspace{0.17em}}(\text{mod}\text{}n)$ for all $x$ with $1\le x\le {10}^{9}$.

${y}^{e}\text{\hspace{0.17em}}(\text{mod}\text{}n)$ for all $y$ with $1\le y\le {10}^{9}$.

She looks for a match between an element on the first list and an element on the second list. If she finds one, then she has $c{x}^{-e}\equiv {y}^{e}$ for some $x\text{,}\text{}y$. This yields

so $m\equiv xy\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Is this attack likely to succeed? Suppose $m$ is the product of two integers $x$ and $y$, both less than ${10}^{9}$. Then these $x\text{,}\text{}y$ will yield a match for Eve. Not every $m$ will have this property, but many values of $m$ are the product of two integers less than ${10}^{9}$. For these, Eve will obtain $m$.

This attack is much more efficient than trying all ${10}^{17}$ possibilities for $m$, which is nearly impossible on one computer, and would take a long time even with several computers working in parallel. In the present attack, Eve needs to compute and store a list of length ${10}^{9}$, then compute the elements on the other list and check each one against the first list. Therefore, Eve performs approximately $2\times {10}^{9}$ computations (and compares with the list up to ${10}^{9}$ times). This is easily possible on a single computer. For more on this attack, see [Boneh-Joux-Nguyen].

It is easy to prevent this attack. Instead of using a small value of $m$, adjoin some random digits to the beginning and end of $m$ so as to form a much longer plaintext. When Bob decrypts the ciphertext, he simply removes these random digits and obtains $m$.

A more sophisticated method of preprocessing the plaintext, namely Optimal Asymmetric Encryption Padding (OAEP), was introduced by Bellare and Rogaway [Bellare-Rogaway2] in 1994. Suppose Alice wants to send a message $m$ to Bob, whose RSA public key is $(n\text{,}\text{}e)$, where $n$ has $k$ bits. Two positive integers ${k}_{0}$ and ${k}_{1}$ are specified in advance, with ${k}_{0}+{k}_{1}<k$. Alice’s message is allowed to have $k-{k}_{0}-{k}_{1}$ bits. Typical values are $k=1024$, ${k}_{0}={k}_{1}=128$, $k-{k}_{0}-{k}_{1}=768$. Let $G$ be a function that inputs strings of ${k}_{0}$ bits and outputs strings of $k-{k}_{0}$ bits. Let $H$ be a function that inputs $k-{k}_{0}$ bits and outputs ${k}_{0}$ bits. The functions $G$ and $H$ are usually constructed from hash functions (see Chapter 11 for a discussion of hash functions). To encrypt $m$, Alice first expands it to length $k-{k}_{0}$ by adjoining ${k}_{1}$ zero bits. The result is denoted $m{0}^{{k}_{1}}$. She then chooses a random string $r$ of ${k}_{0}$ bits and computes

If the concatenation ${x}_{1}||{x}_{2}$ is a binary number larger than $n$, Alice chooses a new random number $r$ and computes new values for ${x}_{1}$ and ${x}_{2}$. As soon as she obtains ${x}_{1}||{x}_{2}<n$ (this has a probability of at least 1/2 of happening for each $r$, as long as $G(r)$ produces fairly random outputs), she forms the ciphertext

To decrypt a ciphertext $c$, Bob uses his private RSA decryption exponent $d$ to compute ${c}^{d}\text{\hspace{0.17em}}(\text{mod}\text{}n)$. The result is written in the form

where ${y}_{1}$ has $k-{k}_{0}$ bits and ${y}_{2}$ has ${k}_{0}$ bits. Bob then computes

The correctness of this decryption can be justified as follows. If the ciphertext is the encryption of $m$, then

Therefore,

and

Bob removes the ${k}_{1}$ zero bits from the end of $m{0}^{{k}_{1}}$ and obtains $m$. Also, Bob has check on the integrity of the ciphertext. If there are not ${k}_{1}$ zeros at the end, then the ciphertext does not correspond to a valid encryption.

This method is sometimes called plaintext-aware encryption. Note that the padding with ${x}_{2}$ depends on the message $m$ and on the random parameter $r$. This makes chosen ciphertext attacks on the system more difficult. It also is used for ciphertext indistinguishability. See Section 4.5.

For discussion of the security of OAEP, see [Shoup].

# 9.2.3 Timing Attacks

Another type of attack on RSA and similar systems was discovered by Paul Kocher in 1995, while he was an undergraduate at Stanford. He showed that it is possible to discover the decryption exponent by carefully timing the computation times for a series of decryptions. Though there are ways to thwart the attack, this development was unsettling. There had been a general feeling of security since the mathematics was well understood. Kocher’s attack demonstrated that a system could still have unexpected weaknesses.

Here is how the timing attack works. Suppose Eve is able to observe Bob decrypt several ciphertexts $y$. She times how long this takes for each $y$. Knowing each $y$ and the time required for it to be decrypted will allow her to find the decryption exponent $d$. But first, how could Eve obtain such information? There are several situations where encrypted messages are sent to Bob and his computer automatically decrypts and responds. Measuring the response times suffices for the present purposes.

We need to assume that we know the hardware being used to calculate ${y}^{d}\text{\hspace{0.17em}}(\text{mod}\text{}n)$. We can use this information to calculate the computation times for various steps that potentially occur in the process.

Let’s assume that ${y}^{d}\text{\hspace{0.17em}}(\text{mod}\text{}n)$ is computed by an algorithm given in Exercise 56 in Chapter 3, which is as follows:

Let $d={b}_{1}{b}_{2}\dots {b}_{w}$ be written in binary (for example, when $x=1011$, we have ${b}_{1}=1\text{,}\text{}{b}_{2}=0\text{,}\text{}{b}_{3}=1\text{,}\text{}{b}_{4}=1$). Let $y$ and $n$ be integers. Perform the following procedure:

Start with $k=1$ and ${s}_{1}=1$.

If ${b}_{k}=1$, let ${r}_{k}\equiv {s}_{k}y\text{\hspace{0.17em}}(\text{mod}\text{}n)$. If ${b}_{k}=0$, let ${r}_{k}={s}_{k}$.

Let ${s}_{k+1}\equiv {r}_{k}^{2}\text{\hspace{0.17em}}(\text{mod}\text{}n)$.

If $k=w$, stop. If $k<w$, add 1 to $k$ and go to (2).

Then ${r}_{w}\equiv {y}^{d}\text{\hspace{0.17em}}(\text{mod}\text{}n)$.

Note that the multiplication ${s}_{k}y$ occurs only when the bit ${b}_{k}=1$. In many situations, there is a reasonably large variation in how long this multiplication takes. We assume this is the case here.

Before we continue, we need a few facts from probability. Suppose we have a random process that produces real numbers $t$ as outputs. For us, $t$ will be the time it takes for the computer to complete a calculation, given a random input $y$. The mean is the average value of these outputs. If we record outputs ${t}_{1}\text{,}\text{}\dots \text{,}\text{}{t}_{n}$, the mean should be approximately $m=({t}_{1}+\cdots {t}_{n})/n$. The variance for the random process is approximated by

The standard deviation is the square root of the variance and gives a measure of how much variation there is in the values of the ${t}_{i}$’s.

The important fact we need is that when two random processes are independent, the variance for the sum of their outputs is the sum of the variances of the two processes. For example, we will break the computation done by the computer into two independent processes, which will take times ${t}^{\prime}$ and ${t}^{\prime \prime}$. The total time $t$ will be ${t}^{\prime}+{t}^{\prime \prime}$. Therefore, $\text{Var}(\{{t}_{i}\})$ should be approximately $\text{Var}(\{{t}_{i}^{\prime}\})+\text{Var}(\{{t}_{i}^{\u2033}\})$.

Now assume Eve knows ciphertexts ${y}_{1}\text{,}\text{}\dots \text{,}\text{}{y}_{n}$ and the times that it took to compute each ${y}_{i}^{d}\text{\hspace{0.17em}}(\text{mod}\text{}n)$. Suppose she knows bits ${b}_{1}\text{,}\text{}\dots \text{,}\text{}{b}_{k-1}$ of the exponent $d$. Since she knows the hardware being used, she knows how much time was used in calculating ${r}_{1}\text{,}\text{}\dots \text{,}\text{}{r}_{k-1}$ in the preceding algorithm. Therefore, she knows, for each ${y}_{i}$, the time ${t}_{i}$ that it takes to compute ${r}_{k}\text{,}\text{}\dots \text{,}\text{}{r}_{w}$.

Eve wants to determine ${b}_{k}$. If ${b}_{k}=1$, a multiplication ${s}_{k}y\text{\hspace{0.17em}}(\text{mod}\text{}n)$ will take place for each ciphertext ${y}_{i}$ that is processed. If ${b}_{k}=0$, there is no such multiplication.

Let ${t}_{i}^{\prime}$ be the amount of time it takes the computer to perform the multiplication ${s}_{k}y\text{\hspace{0.17em}}(\text{mod}\text{}n)$, though Eve does not yet know whether this multiplication actually occurs. Let ${t}_{i}^{\u2033}={t}_{i}-{t}_{i}^{\prime}$. Eve computes $\text{Var}(\{{t}_{i}\})$ and $\text{Var}(\{{t}_{i}^{\u2033}\})$. If $\text{Var}(\{{t}_{i}\})\text{}\text{}\text{Var}(\{{t}_{i}^{\u2033}\})$, then Eve concludes that ${b}_{k}=1$. If not, ${b}_{k}=0$. After determining ${b}_{k}$, she proceeds in the same manner to find all the bits.

Why does this work? If the multiplication occurs, ${t}_{i}^{\u2033}$ is the amount of time it takes the computer to complete the calculation after the multiplication. It is reasonable to assume ${t}_{i}^{\prime}$ and ${t}_{i}^{\u2033}$ are outputs that are independent of each other. Therefore,

If the multiplication does not occur, ${t}_{i}^{\prime}$ is the amount of time for an operation unrelated to the computation, so it is reasonable to assume ${t}_{i}$ and ${t}_{i}^{\prime}$ are independent. Therefore,

Note that we couldn’t use the mean in place of the variance, since the mean of $\{-{t}_{i}\}$ would be negative, so the last inequality would not hold. All that can be deduced from the mean is the total number of nonzero bits in the binary expansion of $d$.

The preceding gives a fairly simple version of the method. In practice, various modifications would be needed, depending on the specific situation. But the general strategy remains the same. For more details, see [Kocher]. For more on timing attacks, see [Crosby et al.].

A similar attack on RSA works by measuring the power consumed during the computations. See [Kocher et al.]. Another method, called acoustic cryptanalysis, obtains information from the high-pitched noises emitted by the electronic components of a computer during its computations. See [Genkin et al.]. Attacks such as these and the timing attack can be prevented by appropriate design features in the physical implementation.

Timing attacks, power analysis, and acoustic cryptanalysis are examples of **side-channel attacks**, where the attack is on the implementation rather than on the basic cryptographic algorithm.