24.4 Linear Codes – Introduction to Cryptography with Coding Theory, 3rd Edition

24.4 Linear Codes

When you are having a conversation with a friend over a cellular phone, your voice is turned into digital data that has an error correcting code applied to it before it is sent. When your friend receives the data, the errors in transmission must be accounted for by decoding the error correcting code. Only after decoding are the data turned into sound that represents your voice.

The amount of delay it takes for a packet of data to be decoded is critical in such an application. If it took several seconds, then the delay would become aggravating and make holding a conversation difficult.

The problem of efficiently decoding a code is therefore of critical importance. In order to decode quickly, it is helpful to have some structure in the code rather than taking the code to be a random subset of An. This is one of the primary reasons for studying linear codes. For the remainder of this chapter, we restrict our attention to linear codes.

Henceforth, the alphabet A will be a finite field F. For an introduction to finite fields, see Section 3.11. For much of what we do, the reader can assume that F is Z2={0, 1} = the integers mod 2, in which case we are working with binary vectors. Another concrete example of a finite field is Zp = the integers mod a prime p. For other examples, see Section 3.11. In particular, as is pointed out there, F must be one of the finite fields GF(q); but the present notation is more compact. Since we are working with arbitrary finite fields, we’ll use “=" instead of “" in our equations. If you want to think of F as being Z2, just replace all equalities between elements of F with congruences mod 2.

The set of n-dimensional vectors with entries in F is denoted by Fn. They form a vector space over F. Recall that a subspace of Fn is a nonempty subset S that is closed under linear combinations, which means that if s1, s2 are in S and a1, a2 are in F, then a1s1+a2s2S. By taking a1=a2=0, for example, we see that (0, 0, , , 0)S.


A linear code of dimension k and length n over a field F is a k-dimensional subspace of Fn. Such a code is called an [n,k] code. If the minimum distance of the code is d, then the code is called an [n,k,d] code.

When F=Z2, the definition can be given more simply. A binary code of length n and dimension k is a set of 2k binary n-tuples (the codewords) such that the sum of any two codewords is always a codeword.

Many of the codes we have met are linear codes. For example, the binary repetition code {(0, 0, 0), (1, 1, 1)} is a one-dimensional subspace of Z23. The parity check code from Exercise 2 in Section 24.1 is a linear code of dimension 7 and length 8. It consists of those binary vectors of length 8 such that the sum of the entries is 0 mod 2. It is not hard to show that the set of such vectors forms a subspace. The vectors

(1, 0, 0, 0, 0, 0, 0, 1),  (0, 1, 0, 0, 0, 0, 0, 1),  , (0, 0, 0, 0, 0, 0, 1, 1)

form a basis of this subspace. Since there are seven basis vectors, the subspace is seven-dimensional.

The Hamming [7, 4] code from Example 4 of Section 24.1 is a linear code of dimension 4 and length 7. Every codeword is a linear combination of the four rows of the matrix G. Since these four rows span the code and are linearly independent, they form a basis.

The ISBN code (Example 5 of Section 24.1) is not linear. It consists of a set of 10-dimensional vectors with entries in Z11. However, it is not closed under linear combinations since X is not allowed as one of the first nine entries.

Let C be a linear code of dimension k over a field F. If F has q elements, then C has qk elements. This may be seen as follows. There is a basis of C with k elements; call them v1, , vk. Every element of C can be written uniquely in the form a1v1++akvk, with a1, , akF. There are q choices for each ai and there are k numbers ai. This means there are qk elements of C, as claimed. Therefore, an [n, k, d] linear code is an (n, qk, d) code in the notation of Section 24.2.

For an arbitrary, possibly nonlinear, code, computing the minimum distance could require computing d(u, v) for every pair of codewords. For a linear code, the computation is much easier. Define the Hamming weight wt(u) of a vector u to be the number of nonzero places in u. It equals d(u, 0), where 0 denotes the vector (0, 0, , 0).


Let C be a linear code. Then d(C) equals the smallest Hamming weight of all nonzero code vectors: d(C)=min{wt(u) | 0uC}.

Proof. Since wt(u)=d(u, 0) is the distance between two codewords, we have wt(u)d(C) for all codewords u. It remains to show that there is a codeword with weight equal to d(C). Note that d(v, w)=wt(vw) for any two vectors v, w. This is because an entry of vw is nonzero, and hence gets counted in wt(vw), if and only if v and w differ in that entry. Choose v and w to be distinct codewords such that d(v, w)=d(C). Then wt(vw)=d(C), so the minimum weight of the nonzero codewords equals d(C).

To construct a linear [n, k] code, we have to construct a k-dimensional subspace of Fn. The easiest way to do this is to choose k linearly independent vectors and take their span. This can be done by choosing a k×n generating matrix G of rank k, with entries in F. The set of vectors of the form vG, where v runs through all row vectors in Fk, then gives the subspace.

For our purposes, we’ll usually take G=[Ik, P], where Ik is the k×k identity matrix and P is a k×(nk) matrix. The rows of G are the basis for a k-dimensional subspace of the space of all vectors of length n. This subspace is our linear code C. In other words, every codeword is uniquely expressible as a linear combination of rows of G. If we use a matrix G=[Ik, P] to construct a code, the first k columns determine the codewords. The remaining nk columns provide the redundancy.

The code in the second half of Example 1, Section 24.1, has


The codewords 101010 and 010101 appear as rows in the matrix and the codeword 111111 is the sum of these two rows. This is a [6, 2] code.

The code in Example 2 has


For example, the codeword 11001001 is the sum mod 2 of the first, second, and fifth rows, and hence is obtained by multiplying (1, 1, 0, 0, 1, 0, 0) times G. This is an [8, 7] code.

In Exercise 4, the matrix G is given in the description of the code. As you can guess from its name, it is a [7, 4] code.

As mentioned previously, we could start with any k×n matrix of rank k. Its rows would generate an [n, k] code. However, row and column operations can be used to transform the matrix to the form of G we are using, so we usually do not work with the more general situation. A code described by a matrix G=[Ik, P] as before is said to be systematic. In this case, the first k bits are the information symbols and the last nk symbols are the check symbols.

Suppose we have G=[Ik, P] as the generating matrix for a code C. Let

H=[PT, Ink], 

where PT is the transpose of P. In Exercise 4 of Section 24.1, this is the matrix that was used to correct errors. For Exercise 2, we have H=[1, 1, 1, 1, 1, 1, 1, 1]. Note that in this case a binary string v is a codeword if and only if the number of nonzero bits is even, which is the same as saying that its dot product with H is zero. This can be rewritten as vHT=0, where HT is the transpose of H.

More generally, suppose we have a linear code CFn. A matrix H is called a parity check matrix for C if H has the property that a vector vFn is in C if and only if vHT=0. We have the following useful result.


If G=[Ik, P] is the generating matrix for a code C, then H=[PT, Ink] is a parity check matrix for C.

Proof. Consider the ith row of G, which has the form

vi=(0, , 1, , 0, pi, 1, , pi, nk), 

where the 1 is in the ith position. This is a vector of the code C. The jth column of HT is the vector

(p1, j, , pnk, j, 0, , 1, , 0), 

where the 1 is in the (nk+j)th position. To obtain the jth element of viHT, take the dot product of these two vectors, which yields

1(pi, j)+pi, j1=0.

Therefore, HT annihilates every row vi of G. Since every element of C is a sum of rows of G, we find that vHT=0 for all vC.

Recall the following fact from linear algebra: The left null space of an m×n matrix of rank r has dimension nr. Since HT contains Ink as a submatrix, it has rank nk. Therefore, its left null space has dimension k. But we have just proved that C is contained in this null space. Since C also has dimension k, it must equal the null space, which is what the theorem claims.

We now have a way of detecting errors: If v is received during a transmission and vHT0, then there is an error. If vHT=0, we cannot conclude that there is no error, but we do know that v is a codeword. Since it is more likely that no errors occurred than enough errors occurred to change one codeword into another codeword, the best guess is that an error did not occur.

We can also use a parity check matrix to make the task of decoding easier. First, let’s look at an example.


Let C be the binary linear code with generator matrix


We are going to make a table of all binary vectors of length 4 according to the following procedure. First, list the four elements of the code in the first row, starting with (0, 0, 0, 0). Then, among the 12 remaining vectors, choose one of smallest weight (there might be several choices). Add this vector to the first row to obtain the second row. From the remaining eight vectors, again choose one with smallest weight and add it to the first row to obtain the third row. Finally, choose a vector with smallest weight from the remaining four vectors, add it to the first row, and obtain the fourth row. We obtain the following:

(0, 0, 0, 0)(1, 0, 1, 1)(0, 1, 1, 0)(1, 1, 0, 1)(1, 0, 0, 0)(0, 0, 1, 1)(1, 1, 1, 0)(0, 1, 0, 1)(0, 1, 0, 0)(1, 1, 1, 1)(0, 0, 1, 0)(1, 0, 0, 1)(0, 0, 0, 1)(1, 0, 1, 0)(0, 1, 1, 1)(1, 1, 0, 0).

This can be used as a decoding table. When we receive a vector, find it in the table. Decode by changing the vector to the one at the top of its column. The error that is removed is first element of its row. For example, suppose we receive (0, 1, 0, 1). It is the last element of the second row. Decode it to (1, 1, 0, 1), which means removing the error (1, 0, 0, 0). In this small example, this is not exactly the same as nearest neighbor decoding, since (0, 0, 1, 0) decodes as (0, 1, 1, 0) when it has an equally close neighbor (0, 0, 0, 0). The problem is that the minimum distance of the code is 2, so general error correction is not possible. However, if we use a code that can correct up to t errors, this procedure correctly decodes all vectors that are distance at most t from a codeword.

In a large example, finding the vector in the table can be tedious. In fact, writing the table can be rather difficult (that’s why we used such a small example). This is where a parity check matrix H comes to the rescue.

The first vector v in a row is called the coset leader. Let r be any vector in the same row as v. Then r=v+c for some codeword c, since this is how the table was constructed. Therefore,


since cHT=0 by the definition of a parity check matrix. The vector S(r)=rHT is called the syndrome of r. What we have shown is that two vectors in the same row have the same syndrome. Replace the preceding table with the following much smaller table.

Coset Leader Syndrome
(0,  0,  0,  0) (0,  0)
(1,  0,  0,  0) (1,  1)
(0,  1,  0,  0) (1,  0)
(0,  0,  0,  1) (0,  1)

This table may be used for decoding as follows. For a received vector r, calculate its syndrome S(r)=rHT. Find this syndrome on the list and subtract the corresponding coset leader from r. This gives the same decoding as above. For example, if r=(0, 1, 0, 1), then

S(r)=(0,  1,  0,  1)11101001=(1,  1).

This is the syndrome for the second row. Subtract the coset leader (1, 0, 0, 0) from r to obtain the codeword (1, 1, 0, 1).

We now consider the general situation. The method of the example leads us to two definitions.


Let C be a linear code and let u be an n-dimensional vector. The set u+C given by

u+C={u+c | cC}

is called a coset of C.

It is easy to see that if vu+C, then the sets v+C and u+C are the same (Exercise 9).


A vector having minimum Hamming weight in a coset is called a coset leader.

The syndrome of a vector u is defined to be S(u)=uHT. The following lemma allows us to determine the cosets easily.


Two vectors u and v belong to the same coset if and only if they have the same syndrome.

Proof. Two vectors u and v to belong to the same coset if and only if their difference belongs to the code C; that is, uvC. This happens if and only if (uv)HT=0, which is equivalent to S(u)=uHT=vHT=S(v).

Decoding can be achieved by building a syndrome lookup table, which consists of the coset leaders and their corresponding syndromes. With a syndrome lookup table, we can decode with the following steps:

  1. For a received vector r, calculate its syndrome S(r)=rHT.

  2. Next, find the coset leader with the same syndrome as S(r). Call the coset leader c0.

  3. Decode r as rc0.

Syndrome decoding requires significantly fewer steps than searching for the nearest codeword to a received vector. However, for large codes it is still too inefficient to be practical. In general, the problem of finding the nearest neighbor in a general linear code is hard; in fact, it is what is known as an NP-complete problem. However, for certain special types of codes, efficient decoding is possible. We treat some examples in the next few sections.

24.4.1 Dual Codes

The vector space Fn has a dot product, defined in the usual way:

(a1, , an)(b0, , bn)=a0b0++anbn.

For example, if F=Z2, then

(0, 1, 0, 1, 1, 1)(0, 1, 0, 1, 1, 1)=0, 

so we find the possibly surprising fact that the dot product of a nonzero vector with itself can sometimes be 0, in contrast to the situation with real numbers. Therefore, the dot product does not tell us the length of a vector. But it is still a useful concept.

If C is a linear [n, k] code, define the dual code

C={uFn | uc=0 for all cC}.


If C is a linear [n, k] code with generating matrix G=[Ik, P], then C is a linear [n, nk] code with generating matrix H=[PT, Ink]. Moreover, G is a parity check matrix for C.

Proof. Since every element of C is a linear combination of the rows of G, a vector u is in C if and only if uGT=0. This means that C is the left null space of GT. Also, we see that G is a parity check matrix for C. Since G has rank k, so does GT. The left null space of GT therefore has dimension nk, so C has dimension nk. Because H is a parity check matrix for C, and the rows of G are in C, we have GHT=0. Taking the transpose of this relation, and recalling that transpose reverses order ((AB)T=BTAT), we find HGT=0. This means that the rows of H are in the left null space of GT; therefore, in C. Since H has rank nk, the span of its rows has dimension nk, which is the same as the dimension of C. It follows that the rows of H span C, so H is a generating matrix for C.

A code C is called self-dual is C=C. The Golay code G24 of Section 24.6 is an important example of a self-dual code.


Let C={(0, 0, 0), (1, 1, 1)} be the binary repetition code. Since u(0, 0, 0)=0 for every u, a vector u is in C if and only if u(1, 1, 1)=0. This means that C is a parity check code: (a1, a2, a3)C if and only if a1+a2+a3=0.


Let C be the binary code with generating matrix


The proposition says that C has generating matrix


This is G with the rows switched, so the rows of G and the rows of H generate the same subspace. Therefore, C=C, which says that C is self-dual.