25.3 Shor’s AlgorithmShor’s Algorithm – Introduction to Cryptography with Coding Theory, 3rd Edition

25.3 Shor’s Algorithm

Quantum computers are not yet a reality. The current versions can only handle a few qubits. But, if the great technical problems can be overcome and large quantum computers are built, the effect on cryptography will be enormous. In this section we give a brief glimpse at how a quantum computer could factor large integers, using an algorithm developed by Peter Shor. We avoid discussing quantum mechanics and ask the reader to believe that a quantum computer should be able to do all the operations we describe, and do them quickly. For more details, see, for example, [Ekert-Josza] or [Rieffel-Polak].

What is a quantum computer and what does it do? First, let’s look at what a classical computer does. It takes a binary input, for example, 100010, and gives a binary output, perhaps 0101. If it has several inputs, it has to work on them individually. A quantum computer takes as input a certain number of qubits and outputs some qubits. The main difference is that the input and output qubits can be linear combinations of certain basic states. The quantum computer operates on all basic states in this linear combination simultaneously. In effect, a quantum computer is a massively parallel machine.

For example, think of the basic state |100 as representing three particles, the first in orientation 1 and the last two in orientation 0 (with respect to some basis that will implicitly be fixed throughout the discussion). The quantum computer can take |100 and produce some output. However, it can also take as input a normalized (that is, of length 1) linear combination of basic quantum states such as


and produce an output just as quickly as it did when working with a basic state. After all, the computer could not know whether a quantum state is one of the basic states, or a linear combination of them, without making a measurement. But such a measurement would alter the input. It is this ability to work with a linear combination of states simultaneously that makes a quantum computer potentially very powerful.

Suppose we have a function f(x) that can be evaluated for an input x by a classical computer. The classical computer asks for an input and produces an output. A quantum computer, on the other hand, can accept as input a sum


(C is a normalization factor) of all possible input states and produce the output

1Cx|x, f(x), 

where |x, f(x) is a longer sequence of qubits, representing both x and the value of f(x). (Technical point: It might be notationally better to input (1/C)|x, 00 in order to have some particles to change to f(x). For simplicity, we will not do this.) So we can obtain a list of all the values of f(x). This looks great, but there is a problem. If you make a measurement, you force the quantum state into the result of the measurement. You get |x0, f(x0) for some randomly chosen x0, and the other states in the output are destroyed. So, if you are going to look at the list of values of f(x), you’d better do it carefully, since you get only one chance. In particular, you probably want to apply some transformation to the output in order to put it into a more desirable form. The skill in programming a quantum computer is in designing the computation so that the outputs you want to examine appear with much higher probability than the others. This is what is done in Shor’s factorization algorithm.

25.3.1 Factoring

We want to factor n. The strategy is as follows. Recall that if we can find (nontrivial) a and r with ar1(modn), then we have a good chance of factoring n (see the factorization method in Subsection 9.4.1). Choose a random a and consider the sequence 1, a, a2, a3, (modn). If ar1(modn), then this sequence will repeat every r terms since aj+rajaraj(modn). If we can measure the period of this sequence (or a multiple of the period), we will have an r such that ar1(modn). We therefore want to design our quantum computer so that when we make a measurement on the output, we’ll have a high chance of obtaining the period.

25.3.2 The Discrete Fourier Transform

We need a technique for finding the period of a periodic sequence. Classically, Fourier transforms can be used for this purpose, and they can be used in the present situation, too. Suppose we have a sequence

a0, a1, , a2m1

of length 2m, for some integer m. Define the Fourier transform to be


where 0x<2m.

For example, consider the sequence

1, 3, 7, 2, 1, 3, 7, 2

of length 8 and period 4. The length divided by the period is the frequency, namely 2, which is how many times the sequence repeats. The Fourier transform takes the values

F(0)=26/8, F(2)=(12+2i)/8, F(4)=6/8, F(6)=(122i)/8, 

For example, letting ζ=e2πi/8, we find that


Since ζ4=1, the terms cancel and we obtain F(1)=0. The nonzero values of F occur at multiples of 2, which is the frequency.

Let’s consider another example: 2, 1, 2, 1, 2, 1, 2, 1. The Fourier transform is

F(0)=12/8, F(4)=4/8, 

Here the nonzero values of F are again at the multiples of the frequency.

In general, if the period is a divisor of 2m, then all the nonzero values of F will occur at multiples of the frequency (however, a multiple of the frequency could still yield 0). See Exercise 2.

Suppose now that the period isn’t a divisor of 2m. Let’s look at an example. Consider the sequence 1, 0, 0, 1, 0, 0, 1, 0. It has length 8 and almost has period 3 and frequency 3, but we stopped the sequence before it had a chance to complete the last period. In Figure 25.4, we graph the absolute value of its Fourier transform (these are real numbers, hence easier to graph than the complex values of the Fourier transform). Note that there are peaks at 0, 3, and 5. If we continued F(x) to larger values of x we would get peaks at 8, 11, 13, 16, . The peaks are spaced at an average distance of 8/3. Dividing the length of the sequence by the average distance yields a period of 8/(8/3)=3, which agrees with our intuition.

Figure 25.4 The Absolute Value of a Discrete Fourier Transform

The fact that there is a peak at 0 is not very surprising. The formula for the Fourier transform shows that the value at 0 is simply the sum of the elements in the sequence divided by the square root of the length of the sequence.

Let’s look at one more example: 1, 0, 0, 0, 0, 1, 0, 0, 0, 0 1, 0, 0, 0, 0, 1. This sequence has 16 terms. Our intuition might say that the period is around 5 and the frequency is slightly more than 3. Figure 25.5 shows the graph of the absolute value of its Fourier transform. Again, the peaks are spaced around 3 apart, so we can say that the frequency is around 3. The period of the original sequence is therefore around 5, which agrees with our intuition.

Figure 25.5 The Absolute Value of a Discrete Fourier Transform

In the first two examples, the period was a divisor of the length (namely, 8) of the sequence. We obtained nonzero values of the Fourier transform only at multiples of the frequency. In these last two examples, the period was not a divisor of the length (8 or 16) of the sequence. This introduced some “noise” into the situation. We had peaks at approximate multiples of the frequency and values close to 0 away from these peaks.

The conclusion is that the peaks of the Fourier transform occur approximately at multiples of the frequency, and the period is approximately the number of peaks. This will be useful in Shor’s algorithm.

25.3.3 Shor’s Algorithm

Choose m so that n22m<2n2. We start with m qubits, all in state 0:


As in the previous section, by changing axes, we can transform the first bit to a linear combination of |0 and |1, which gives us


We then successively do a similar transformation to the second bit, the third bit, up through the mth bit, to obtain the quantum state


Thus all possible states of the m qubits are superimposed in this sum. For simplicity of notation, we replace each string of 0s and 1s with its decimal equivalent, so we write


Choose a random number a with 1<a<n. We may assume gcd(a, n)=1; otherwise, we have a factor of n. The quantum computer computes the function f(x)=ax(modn) for this quantum state to obtain

12m(|0, a0+|1, a1+|2, a2++|2m1, a2m1)

(for ease of notation, ax is used to denote ax(modn)). This gives a list of all the values of ax. However, so far we are not any better off than with a classical computer. If we measure the state of the system, we obtain a basic state |x0, ax0 for some randomly chosen x0. We cannot even specify which x0 we want to use. Moreover, the system is forced into this state, obliterating all the other values of ax that have been computed. Therefore, we do not want to measure the whole system. Instead, we measure the value of the second half. Each basic piece of the system is of the form |x, ax, where x represents m bits and ax is represented by m/2 bits (since ax(modn)<n<2m/2). If we measure these last m/2 bits, we obtain some number u(modn), and the whole system is forced into a combination of those states of the form |x, u with axu(modn):

1C0x<2maxu(modn)|x, u, 

where C is whatever factor is needed to make the vector have length 1 (in fact, C is the square root of the number of terms in the sum).


At this point, it is probably worthwhile to have an example. Let n=21. (This example might seem simple, but it is the largest that quantum computers using Shor’s algorithm can currently handle. Other algorithms are being developed that can go somewhat farther.) Since 212<29<2212, we have m=9. Let’s choose a=11, so we compute the values of 11x(mod21) to obtain

1512(|0, 1+|1, 11+|2, 16+|3, 8+|4, 4+|5, 2+|6, 1+|7, 11+|8, 16+|9, 8+|10, 4+|11, 2+|12, 1+|13, 11+|14, 16+|15, 8+|16, 4+|17, 2+|18, 1+|19, 11+|20, 16++|508, 4+|509, 2+|510, 1+|511, 11).

Suppose we measure the second part and obtain 2. This means we have extracted all the terms of the form |x, 2 to obtain

185(|5, 2+|11, 2+|17, 2+|23, 2++|497, 2+|503, 2+|509, 2).

For notational convenience, and since it will no longer be needed, we drop the second part to obtain


If we now measured this system, we would simply obtain a number x such that 11x2(mod21). This would not be useful.

Suppose we could take two measurements. Then we would have two numbers x and y with 11x11y(mod21). This would yield 11xy1(mod21). By the factorization method of Subsection 9.4.1, this would give us a good chance of being able to factor 21. However, we cannot take two independent measurements. The first measurement puts the system into the output state, so the second measurement would simply give the same answer as the first.

Not all is lost. Note that in our example, the numbers in our state are periodic with period 6. In general, the values of ax(modn) are periodic with period r, with ar1(modn). So suppose we are able to make a measurement that yields the period. We then have a situation where ar1(modn), so we can hope to factor n by the method from Subsection 9.4.1 mentioned above.

The quantum Fourier transform is exactly the tool we need. It measures frequencies, which can be used to find the period. If r happens to be a divisor of 2m, then the frequencies we obtain are multiples of a fundamental frequency f0, and rf0=2m. In general, r is not a divisor of 2m, so there will be some dominant frequencies, and they will be approximate multiples of a fundamental frequency f0 with rf02m. This will be seen in the analysis of our example and in Figure 25.6.

Figure 25.6 The Absolute Value of g(c)

The quantum Fourier transform is defined on a basic state |x (with 0x<2m) by


It extends to a linear combination of states by linearity:


We can therefore apply QF< to our quantum state.

In our example, we compute


and obtain a sum


for some numbers g(c).

The number g(c) is given by


which is the discrete Fourier transform of the sequence

0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, , 0, 0, 0, 0, 0, 1, 0, 0.

Therefore, the peaks of the graph of the absolute value of g should correspond to the frequency of the sequence, which should be around 512/685. The graph in Figure 25.6 is a plot of |g|.

There are sharp peaks at c=0, 85, 171, 256, 341, 427 (the ones at 0 and 256 do not show up on the graph since they are centered at one value; see below). These are the dominant frequencies mentioned previously. The values of g near the peak at c=341 are


The behavior near c=85, 171, and 427 is similar. At c=0 and 256, we have g(0)=3.756, while all the nearby values of c have g(c)0.015.

The peaks are approximately at multiples of the fundamental frequency f0=85. Of course, we don’t really know this yet, since we haven’t made any measurements.

Now we measure the quantum state of this Fourier transform. Recall that if we start with a linear combination of states a1|x1++a<|x< normalized such that |aj|2=1, then the probability of of obtaining |xk is |ak|2. More generally, if we don’t assume |aj|2=1, the probability is


In our example,


so if we sample the Fourier transform, the probability is around 4×.114=.456 that we obtain one of c=85, 171, 341, 427. Let’s suppose this is the case; say we get c=427. We know, or at least expect, that 427 is approximately a multiple of the frequency f0 that we’re looking for:


for some j. Since rf02m=512, we divide to obtain


Note that 427/512.8345/6. Since we must have rϕ(21)<21, a reasonable guess is that r=6 (see the following discussion of continued fractions).

In general, Shor showed that there is a high chance of obtaining a value of c/2m with


for some j. The method of continued fractions will find the unique (see Exercise 3) value of j/r with r<n satisfying this inequality.

In our example, we take r=6 and check that ar=1161(mod21).

We want to use the factorization method of Subsection 9.4.1 to factor 21. Recall that this method writes r=2km with m odd, and then computes b0am(modn). We then successively square b0 to get b1, b2, , until we reach 1(modn). If bu is the last bi1(modn), we compute gcd(bu1, n) to get a factor (possibly trivial) of n.

In our example, we write 6=23 (a power of 2 times an odd number) and compute (in the notation of Subsection 9.4.1)

gcd(b01, 21)=gcd(7, 21)=7, 

so we obtain 21=73.

In general, once we have a candidate for r, we check that ar1(modn). If not, we were unlucky, so we start over with a new a and form a new sequence of quantum states. If ar1(modn), then we use the factorization method from Subsection 9.4.1. If this fails to factor n, start over with a new a. It is very likely that, in a few attempts, a factorization of n will be found.

We now say more about continued fractions. In Chapter 3, we outlined the method of continued fractions for finding rational numbers with small denominator that approximate real numbers. Let’s apply the procedure to the real number 427/512. We have


This yields the approximating rational numbers

0, 1, 56, 211253, 427512.

Since we know the period in our example is less than n=21, the best guess is the last denominator less than n, namely r=6.

In general, we compute the continued fraction expansion of c/2m, where c is the result of the measurement. Then we compute the approximations, as before. The last denominator less than n is the candidate for r.

25.3.4 Final Words

The capabilities of quantum computers and quantum algorithms are of significant importance to economic and government institutions. Many secrets are protected by cryptographic protocols. Quantum cryptography’s potential for breaking these secrets as well as its potential for protecting future secrets has caused this new research field to grow rapidly over the past few years.

Although the first full-scale quantum computer is probably many years off, and there are still many who are skeptical of its possibility, quantum cryptography has already succeeded in transmitting secure messages over a distances of more than 100 km, and quantum computers have been built that can handle a (very) small number of qubits. Quantum computation and cryptography have already changed the manner in which computer scientists and engineers perceive the capabilities and limits of the computer. Quantum computing has rapidly become a popular interdisciplinary research area and promises to offer many exciting new results in the future.