# 24.4 Linear Codes

When you are having a conversation with a friend over a cellular phone, your voice is turned into digital data that has an error correcting code applied to it before it is sent. When your friend receives the data, the errors in transmission must be accounted for by decoding the error correcting code. Only after decoding are the data turned into sound that represents your voice.

The amount of delay it takes for a packet of data to be decoded is critical in such an application. If it took several seconds, then the delay would become aggravating and make holding a conversation difficult.

The problem of efficiently decoding a code is therefore of critical importance. In order to decode quickly, it is helpful to have some structure in the code rather than taking the code to be a random subset of ${A}^{n}$. This is one of the primary reasons for studying linear codes. For the remainder of this chapter, we restrict our attention to linear codes.

Henceforth, the alphabet $A$ will be a finite field $\mathbf{F}$. For an introduction to finite fields, see Section 3.11. For much of what we do, the reader can assume that $\mathbf{F}$ is ${\mathbf{Z}}_{2}=\{0\text{,}\text{}1\}$ = the integers mod 2, in which case we are working with binary vectors. Another concrete example of a finite field is ${\mathbf{Z}}_{p}$ = the integers mod a prime $p$. For other examples, see Section 3.11. In particular, as is pointed out there, $\mathbf{F}$ must be one of the finite fields $GF(q)$; but the present notation is more compact. Since we are working with arbitrary finite fields, we’ll use “=" instead of “$\equiv $" in our equations. If you want to think of $\mathbf{F}$ as being ${\mathbf{Z}}_{2}$, just replace all equalities between elements of $\mathbf{F}$ with congruences mod 2.

The set of $n$-dimensional vectors with entries in $\mathbf{F}$ is denoted by ${\mathbf{F}}^{n}$. They form a vector space over $\mathbf{F}$. Recall that a subspace of ${\mathbf{F}}^{n}$ is a nonempty subset $S$ that is closed under linear combinations, which means that if ${s}_{1}\text{,}\text{}{s}_{2}$ are in $S$ and ${a}_{1}\text{,}\text{}{a}_{2}$ are in $\mathbf{F}$, then ${a}_{1}{s}_{1}+{a}_{2}{s}_{2}\in S$. By taking ${a}_{1}={a}_{2}=0$, for example, we see that $(0\text{,}\text{}0\text{,}\text{}\text{,}\text{}\dots \text{,}\text{}0)\in S$.

# Definition

A **linear code** of dimension $k$ and length $n$ over a field $\mathbf{F}$ is a $k$-dimensional subspace of ${\mathbf{F}}^{n}$. Such a code is called an $\mathbf{[}\mathit{n}\mathbf{,}\mathit{k}\mathbf{]}$ **code**. If the minimum distance of the code is $d$, then the code is called an $\mathbf{[}\mathit{n}\mathbf{,}\mathit{k}\mathbf{,}\mathit{d}\mathbf{]}$ **code**.

When $\mathbf{F}={\mathbf{Z}}_{2}$, the definition can be given more simply. A binary code of length $n$ and dimension $k$ is a set of ${2}^{k}$ binary $n$-tuples (the codewords) such that the sum of any two codewords is always a codeword.

Many of the codes we have met are linear codes. For example, the binary repetition code $\{(0\text{,}\text{}0\text{,}\text{}0)\text{,}\text{}(1\text{,}\text{}1\text{,}\text{}1)\}$ is a one-dimensional subspace of ${\mathbf{Z}}_{2}^{3}$. The parity check code from Exercise 2 in Section 24.1 is a linear code of dimension 7 and length 8. It consists of those binary vectors of length 8 such that the sum of the entries is 0 mod 2. It is not hard to show that the set of such vectors forms a subspace. The vectors

form a basis of this subspace. Since there are seven basis vectors, the subspace is seven-dimensional.

The Hamming [7, 4] code from Example 4 of Section 24.1 is a linear code of dimension 4 and length 7. Every codeword is a linear combination of the four rows of the matrix $G$. Since these four rows span the code and are linearly independent, they form a basis.

The ISBN code (Example 5 of Section 24.1) is not linear. It consists of a set of 10-dimensional vectors with entries in ${\mathbf{Z}}_{11}$. However, it is not closed under linear combinations since $X$ is not allowed as one of the first nine entries.

Let $C$ be a linear code of dimension $k$ over a field $\mathbf{F}$. If $\mathbf{F}$ has $q$ elements, then $C$ has ${q}^{k}$ elements. This may be seen as follows. There is a basis of $C$ with $k$ elements; call them ${v}_{1}\text{,}\text{}\dots \text{,}\text{}{v}_{k}$. Every element of $C$ can be written uniquely in the form ${a}_{1}{v}_{1}+\cdots +{a}_{k}{v}_{k}$, with ${a}_{1}\text{,}\text{}\dots \text{,}\text{}{a}_{k}\in \mathbf{F}$. There are $q$ choices for each ${a}_{i}$ and there are $k$ numbers ${a}_{i}$. This means there are ${q}^{k}$ elements of $C$, as claimed. Therefore, an $[n\text{,}\text{}k\text{,}\text{}d]$ linear code is an $(n\text{,}\text{}{q}^{k}\text{,}\text{}d)$ code in the notation of Section 24.2.

For an arbitrary, possibly nonlinear, code, computing the minimum distance could require computing $d(u\text{,}\text{}v)$ for every pair of codewords. For a linear code, the computation is much easier. Define the **Hamming weight** $wt(u)$ of a vector $u$ to be the number of nonzero places in $u$. It equals $d(u\text{,}\text{}0)$, where 0 denotes the vector $(0\text{,}\text{}0\text{,}\text{}\dots \text{,}\text{}0)$.

# Proposition

Let $C$ be a linear code. Then $d(C)$ equals the smallest Hamming weight of all nonzero code vectors: $d(C)=min\{wt(u)\text{}|\text{}0\ne u\in C\}$.

Proof. Since $wt(u)=d(u\text{,}\text{}0)$ is the distance between two codewords, we have $wt(u)\ge d(C)$ for all codewords $u$. It remains to show that there is a codeword with weight equal to $d(C)$. Note that $d(v\text{,}\text{}w)=wt(v-w)$ for any two vectors $v\text{,}\text{}w$. This is because an entry of $v-w$ is nonzero, and hence gets counted in $wt(v-w)$, if and only if $v$ and $w$ differ in that entry. Choose $v$ and $w$ to be distinct codewords such that $d(v\text{,}\text{}w)=d(C)$. Then $wt(v-w)=d(C)$, so the minimum weight of the nonzero codewords equals $d(C)$.

To construct a linear $[n\text{,}\text{}k]$ code, we have to construct a $k$-dimensional subspace of ${\mathbf{F}}^{n}$. The easiest way to do this is to choose $k$ linearly independent vectors and take their span. This can be done by choosing a $k\times n$ **generating matrix** $G$ of rank $k$, with entries in $\mathbf{F}$. The set of vectors of the form $vG$, where $v$ runs through all row vectors in ${\mathbf{F}}^{k}$, then gives the subspace.

For our purposes, we’ll usually take $G=[{I}_{k}\text{,}\text{}P]$, where ${I}_{k}$ is the $k\times k$ identity matrix and $P$ is a $k\times (n-k)$ matrix. The rows of $G$ are the basis for a $k$-dimensional subspace of the space of all vectors of length $n$. This subspace is our linear code $C$. In other words, every codeword is uniquely expressible as a linear combination of rows of $G$. If we use a matrix $G=[{I}_{k}\text{,}\text{}P]$ to construct a code, the first $k$ columns determine the codewords. The remaining $n-k$ columns provide the redundancy.

The code in the second half of Example 1, Section 24.1, has

The codewords 101010 and 010101 appear as rows in the matrix and the codeword 111111 is the sum of these two rows. This is a $[6\text{,}\text{}2]$ code.

The code in Example 2 has

For example, the codeword 11001001 is the sum mod 2 of the first, second, and fifth rows, and hence is obtained by multiplying $(1\text{,}\text{}1\text{,}\text{}0\text{,}\text{}0\text{,}\text{}1\text{,}\text{}0\text{,}\text{}0)$ times $G$. This is an $[8\text{,}\text{}7]$ code.

In Exercise 4, the matrix $G$ is given in the description of the code. As you can guess from its name, it is a $[7\text{,}\text{}4]$ code.

As mentioned previously, we could start with any $k\times n$ matrix of rank $k$. Its rows would generate an $[n\text{,}\text{}k]$ code. However, row and column operations can be used to transform the matrix to the form of $G$ we are using, so we usually do not work with the more general situation. A code described by a matrix $G=[{I}_{k}\text{,}\text{}P]$ as before is said to be **systematic**. In this case, the first $k$ bits are the **information symbols** and the last $n-k$ symbols are the **check symbols**.

Suppose we have $G=[{I}_{k}\text{,}\text{}P]$ as the generating matrix for a code $C$. Let

where ${P}^{T}$ is the transpose of $P$. In Exercise 4 of Section 24.1, this is the matrix that was used to correct errors. For Exercise 2, we have $H=[1\text{,}\text{}1\text{,}\text{}1\text{,}\text{}1\text{,}\text{}1\text{,}\text{}1\text{,}\text{}1\text{,}\text{}1]$. Note that in this case a binary string $v$ is a codeword if and only if the number of nonzero bits is even, which is the same as saying that its dot product with $H$ is zero. This can be rewritten as $v{H}^{T}=0$, where ${H}^{T}$ is the transpose of $H$.

More generally, suppose we have a linear code $C\subset {\mathbf{F}}^{n}$. A matrix $H$ is called a **parity check matrix** for $C$ if $H$ has the property that a vector $v\in {\mathbf{F}}^{n}$ is in $C$ if and only if $v{H}^{T}=0$. We have the following useful result.

# Theorem

If $G=[{I}_{k}\text{,}\text{}P]$ is the generating matrix for a code $C$, then $H=[-{P}^{T}\text{,}\text{}{I}_{n-k}]$ is a parity check matrix for $C$.

Proof. Consider the $i$th row of $G$, which has the form

where the 1 is in the $i$th position. This is a vector of the code $C$. The $j$th column of ${H}^{T}$ is the vector

where the 1 is in the $(n-k+j)$th position. To obtain the $j$th element of ${v}_{i}{H}^{T}$, take the dot product of these two vectors, which yields

Therefore, ${H}^{T}$ annihilates every row ${v}_{i}$ of $G$. Since every element of $C$ is a sum of rows of $G$, we find that $v{H}^{T}=0$ for all $v\in C$.

Recall the following fact from linear algebra: The left null space of an $m\times n$ matrix of rank $r$ has dimension $n-r$. Since ${H}^{T}$ contains ${I}_{n-k}$ as a submatrix, it has rank $n-k$. Therefore, its left null space has dimension $k$. But we have just proved that $C$ is contained in this null space. Since $C$ also has dimension $k$, it must equal the null space, which is what the theorem claims.

We now have a way of detecting errors: If $v$ is received during a transmission and $v{H}^{T}\ne 0$, then there is an error. If $v{H}^{T}=0$, we cannot conclude that there is no error, but we do know that $v$ is a codeword. Since it is more likely that no errors occurred than enough errors occurred to change one codeword into another codeword, the best guess is that an error did not occur.

We can also use a parity check matrix to make the task of decoding easier. First, let’s look at an example.

# Example

Let $C$ be the binary linear code with generator matrix

We are going to make a table of all binary vectors of length 4 according to the following procedure. First, list the four elements of the code in the first row, starting with $(0\text{,}\text{}0\text{,}\text{}0\text{,}\text{}0)$. Then, among the 12 remaining vectors, choose one of smallest weight (there might be several choices). Add this vector to the first row to obtain the second row. From the remaining eight vectors, again choose one with smallest weight and add it to the first row to obtain the third row. Finally, choose a vector with smallest weight from the remaining four vectors, add it to the first row, and obtain the fourth row. We obtain the following:

This can be used as a decoding table. When we receive a vector, find it in the table. Decode by changing the vector to the one at the top of its column. The error that is removed is first element of its row. For example, suppose we receive $(0\text{,}\text{}1\text{,}\text{}0\text{,}\text{}1)$. It is the last element of the second row. Decode it to $(1\text{,}\text{}1\text{,}\text{}0\text{,}\text{}1)$, which means removing the error $(1\text{,}\text{}0\text{,}\text{}0\text{,}\text{}0)$. In this small example, this is not exactly the same as nearest neighbor decoding, since $(0\text{,}\text{}0\text{,}\text{}1\text{,}\text{}0)$ decodes as $(0\text{,}\text{}1\text{,}\text{}1\text{,}\text{}0)$ when it has an equally close neighbor $(0\text{,}\text{}0\text{,}\text{}0\text{,}\text{}0)$. The problem is that the minimum distance of the code is 2, so general error correction is not possible. However, if we use a code that can correct up to $t$ errors, this procedure correctly decodes all vectors that are distance at most $t$ from a codeword.

In a large example, finding the vector in the table can be tedious. In fact, writing the table can be rather difficult (that’s why we used such a small example). This is where a parity check matrix $H$ comes to the rescue.

The first vector $v$ in a row is called the **coset leader**. Let $r$ be any vector in the same row as $v$. Then $r=v+c$ for some codeword $c$, since this is how the table was constructed. Therefore,

since $c{H}^{T}=0$ by the definition of a parity check matrix. The vector $S(r)=r{H}^{T}$ is called the **syndrome** of $r$. What we have shown is that two vectors in the same row have the same syndrome. Replace the preceding table with the following much smaller table.

Coset Leader | Syndrome |

$(0\text{,}\text{}\text{}0\text{,}\text{}\text{}0\text{,}\text{}\text{}0)$ | $(0\text{,}\text{}\text{}0)$ |

$(1\text{,}\text{}\text{}0\text{,}\text{}\text{}0\text{,}\text{}\text{}0)$ | $(1\text{,}\text{}\text{}1)$ |

$(0\text{,}\text{}\text{}1\text{,}\text{}\text{}0\text{,}\text{}\text{}0)$ | $(1\text{,}\text{}\text{}0)$ |

$(0\text{,}\text{}\text{}0\text{,}\text{}\text{}0\text{,}\text{}\text{}1)$ | $(0\text{,}\text{}\text{}1)$ |

This table may be used for decoding as follows. For a received vector $r$, calculate its syndrome $S(r)=r{H}^{T}$. Find this syndrome on the list and subtract the corresponding coset leader from $r$. This gives the same decoding as above. For example, if $r=(0\text{,}\text{}1\text{,}\text{}0\text{,}\text{}1)$, then

This is the syndrome for the second row. Subtract the coset leader $(1\text{,}\text{}0\text{,}\text{}0\text{,}\text{}0)$ from $r$ to obtain the codeword $(1\text{,}\text{}1\text{,}\text{}0\text{,}\text{}1)$.

We now consider the general situation. The method of the example leads us to two definitions.

# Definition

Let $C$ be a linear code and let $u$ be an $n$-dimensional vector. The set $u+C$ given by

is called a **coset** of $C$.

It is easy to see that if $v\in u+C$, then the sets $v+C$ and $u+C$ are the same (Exercise 9).

# Definition

A vector having minimum Hamming weight in a coset is called a **coset leader**.

The **syndrome** of a vector $u$ is defined to be $S(u)=u{H}^{T}$. The following lemma allows us to determine the cosets easily.

# Lemma

Two vectors $u$ and $v$ belong to the same coset if and only if they have the same syndrome.

Proof. Two vectors $u$ and $v$ to belong to the same coset if and only if their difference belongs to the code $C$; that is, $u-v\in C$. This happens if and only if $(u-v){H}^{T}=0$, which is equivalent to $S(u)=u{H}^{T}=v{H}^{T}=S(v)$.

Decoding can be achieved by building a syndrome lookup table, which consists of the coset leaders and their corresponding syndromes. With a syndrome lookup table, we can decode with the following steps:

For a received vector $r$, calculate its syndrome $S(r)={rH}^{T}$.

Next, find the coset leader with the same syndrome as $S(r)$. Call the coset leader ${c}_{0}$.

Decode $r$ as $r-{c}_{0}$.

Syndrome decoding requires significantly fewer steps than searching for the nearest codeword to a received vector. However, for large codes it is still too inefficient to be practical. In general, the problem of finding the nearest neighbor in a general linear code is hard; in fact, it is what is known as an NP-complete problem. However, for certain special types of codes, efficient decoding is possible. We treat some examples in the next few sections.

# 24.4.1 Dual Codes

The vector space ${\mathbf{F}}^{n}$ has a dot product, defined in the usual way:

For example, if $\mathbf{F}={\mathbf{Z}}_{2}$, then

so we find the possibly surprising fact that the dot product of a nonzero vector with itself can sometimes be 0, in contrast to the situation with real numbers. Therefore, the dot product does not tell us the length of a vector. But it is still a useful concept.

If $C$ is a linear $[n\text{,}\text{}k]$ code, define the **dual code**

${C}^{\perp}=\{u\in {\mathbf{F}}^{n}\text{}|\text{}u\cdot c=0\text{for all}c\in C\}\text{.}$

# Proposition

If $C$ is a linear $[n\text{,}\text{}k]$ code with generating matrix $G=[{I}_{k}\text{,}\text{}P]$, then ${C}^{\perp}$ is a linear $[n\text{,}\text{}n-k]$ code with generating matrix $H=[-{P}^{T}\text{,}\text{}{I}_{n-k}]$. Moreover, $G$ is a parity check matrix for ${C}^{\perp}$.

Proof. Since every element of $C$ is a linear combination of the rows of $G$, a vector $u$ is in ${C}^{\perp}$ if and only if $u{G}^{T}=0$. This means that ${C}^{\perp}$ is the left null space of ${G}^{T}$. Also, we see that $G$ is a parity check matrix for ${C}^{\perp}$. Since $G$ has rank $k$, so does ${G}^{T}$. The left null space of ${G}^{T}$ therefore has dimension $n-k$, so ${C}^{\perp}$ has dimension $n-k$. Because $H$ is a parity check matrix for $C$, and the rows of $G$ are in $C$, we have $G{H}^{T}=0$. Taking the transpose of this relation, and recalling that transpose reverses order ($(AB{)}^{T}={B}^{T}{A}^{T}$), we find $H{G}^{T}=0$. This means that the rows of $H$ are in the left null space of ${G}^{T}$; therefore, in ${C}^{\perp}$. Since $H$ has rank $n-k$, the span of its rows has dimension $n-k$, which is the same as the dimension of ${C}^{\perp}$. It follows that the rows of $H$ span ${C}^{\perp}$, so $H$ is a generating matrix for ${C}^{\perp}$.

A code $C$ is called **self-dual** is $C={C}^{\perp}$. The Golay code ${G}_{24}$ of Section 24.6 is an important example of a self-dual code.

# Example

Let $C=\{(0\text{,}\text{}0\text{,}\text{}0)\text{,}\text{}(1\text{,}\text{}1\text{,}\text{}1)\}$ be the binary repetition code. Since $u\cdot (0\text{,}\text{}0\text{,}\text{}0)=0$ for every $u$, a vector $u$ is in ${C}^{\perp}$ if and only if $u\cdot (1\text{,}\text{}1\text{,}\text{}1)=0$. This means that ${C}^{\perp}$ is a parity check code: $({a}_{1}\text{,}\text{}{a}_{2}\text{,}\text{}{a}_{3})\in {C}^{\perp}$ if and only if ${a}_{1}+{a}_{2}+{a}_{3}=0$.

# Example

Let $C$ be the binary code with generating matrix

The proposition says that ${C}^{\perp}$ has generating matrix

This is $G$ with the rows switched, so the rows of $G$ and the rows of $H$ generate the same subspace. Therefore, $C={C}^{\perp}$, which says that $C$ is self-dual.