# 14.2 Choosing Primes for RSA

The security of the RSA cryptosystem (see Chapter 9) relies heavily on the inability of an attacker to factor the modulus $n=pq$. Therefore, it is very important that $p$ and $q$ be chosen in an unpredictable way. This usually requires a good pseudorandom number generator to give a starting point in a search for each prime, and a pseudorandom number generator needs a random seed, or some random data, as an input to start its computation.

In 1995, Ian Goldberg and David Wagner, two computer science graduate students at the University of California at Berkeley, were reading the documentation for the Netscape web browser, one of the early Internet browsers. When a secure transaction was needed, the browser generated an RSA key. They discovered that a time stamp was used to form the seed for the pseudorandom number generator that chose the RSA primes. Using this information, they were able to deduce the primes used for a transaction in just a few seconds on a desktop computer.

Needless to say, Netscape quickly put out a new version that repaired this flaw. As Jeff Bidzos, president of RSA Data Security pointed out, Netscape “declined to have us help them on the implementation of our software in their first version. But this time around, they’ve asked for our help” (*NY Times*, Sept. 19, 1995).

Another potential implementation flaw was averted several years ago. A mathematician at a large electronics company (which we’ll leave nameless) was talking with a colleague, who mentioned that they were about to put out an implementation of RSA where they saved time by choosing only one random starting point to look for primes, and then having $p$ and $q$ be the next two primes larger than the start. In horror, the mathematician pointed out that finding the next prime after $\sqrt{n}$, a very straightforward process, immediately yields the larger prime, thus breaking the system.

Perhaps the most worrisome problem was discovered in 2012. Two independent teams collected RSA moduli from the web, for example from X-509 certificates, and computed the gcd of each pair (see [Lenstra2012 et al.] and [Heninger et al.]). They found many cases where the gcd gave a nontrivial factor. For example, the team led by Arjen Lenstra collected $11.4\times {10}^{6}$ RSA moduli. They computed the gcd of each pair of moduli and found 26965 moduli where the gcd gave a nontrivial factor. Fortunately, most of these moduli were at the time no longer in use, but this is still a serious security problem. Unless the team told you, there is no way to know whether your modulus is one of the bad ones unless you want to compute the gcd of your modulus with the $6.4\times {10}^{6}$ other moduli (this is, of course, much faster than computing the gcd of all pairs, not just those that include your modulus). And if you find that your modulus is bad, you have factored someone else’s modulus and thus have broken their system.

How could this have happened? Probably some bad pseudorandom number generators were used. Let’s suppose that your pseudorandom number generator can produce only 1 million different primes (this could easily be the case if you don’t have a good source of random seeds; [Heninger et al.] gives a detailed analysis of this situation). You use it to generate 2000 primes, which you pair into 1000 RSA moduli. So what could go wrong?

Recall the Birthday Paradox (see Section 12.1). The number of “birthdays” is $N={10}^{6}$ and the number of “people” is $2000=2\sqrt{N}$. It is likely that two “people" have the same “birthday.” That is, two of the primes are equal (it is very unlikely that they are used for the same modulus, especially if the generating program is competently written). Therefore, two moduli will share a common prime, and this can be discovered by computing their gcd.

Of course, there are enough large primes that a good pseudorandom number generator will potentially produce so many primes that is is unlikely that a prime will be repeated. As often is the case in cryptography, we see that security ultimately relies on the quality of the pseudorandom number generator. Generating randomness is hard.