# 24.2 Error Correcting Codes

A sender starts with a message and **encodes** it to obtain codewords consisting of sequences of symbols. These are transmitted over a noisy channel, depicted in Figure 24.1, to the receiver. Often the sequences of symbols that are received contain errors and therefore might not be codewords. The receiver must **decode**, which means correct the errors in order to change what is received back to codewords and then recover the original message.

The symbols used to construct the codewords belong to an alphabet. When the alphabet consists of the binary bits 0 and 1, the code is called a **binary code**. A code that uses sequences of three symbols, often represented as integers mod 3, is called a **ternary code**. In general, a code that uses an alphabet consisting of $q$ symbols is called a **$\mathit{q}$-ary code**.

# Definition

Let $A$ be an alphabet and let ${A}^{n}$ denote the set of $n$-tuples of elements of $A$. A **code of length** $\mathit{n}$ is a nonempty subset of ${A}^{n}$.

The $n$-tuples that make up a code are called **codewords**, or code vectors. For example, in a binary repetition code where each symbol is repeated three times, the alphabet is the set $A=\{0\text{,}\text{}1\}$ and the code is the set $\{(0\text{,}\text{}0\text{,}\text{}0)\text{,}\text{}(1\text{,}\text{}1\text{,}\text{}1)\}\subset {A}^{3}$.

Strictly speaking, the codes in the definition are called **block codes**. Other codes exist where the codewords can have varying lengths. These will be mentioned briefly at the end of this chapter, but otherwise we focus only on block codes.

For a code that is a random subset of ${A}^{n}$, decoding could be a time-consuming procedure. Therefore, most useful codes are subsets of ${A}^{n}$ satisfying additional conditions. The most common is to require that $A$ be a finite field, so that ${A}^{n}$ is a vector space, and require that the code be a subspace of this vector space. Such codes are called *linear* and will be discussed in Section 24.4.

For the rest of this section, however, we work with arbitrary, possibly nonlinear, codes. We always assume that our codewords are $n$-dimensional vectors.

In order to decode, it will be useful to put a measure on how close two vectors are to each other. This is provided by the Hamming distance. Let $u\text{,}\text{}v$ be two vectors in ${A}^{n}$. The **Hamming distance** $d(u\text{,}\text{}v)$ is the number of places where the two vectors differ. For example, if we use binary vectors and have the vectors $u=(1\text{,}\text{}0\text{,}\text{}1\text{,}\text{}0\text{,}\text{}1\text{,}\text{}0\text{,}\text{}1\text{,}\text{}0)$ and $v=(1\text{,}\text{}0\text{,}\text{}1\text{,}\text{}1\text{,}\text{}1\text{,}\text{}0\text{,}\text{}0\text{,}\text{}0)$, then $u$ and $v$ differ in two places (the 4th and the 7th) so $d(u\text{,}\text{}v)=2$. As another example, suppose we are working with the usual English alphabet. Then $d(fourth\text{,}\text{}eighth)=4$ since the two strings differ in four places.

The importance of the Hamming distance $d(u\text{,}\text{}v)$ is that it measures the minimum number of “errors” needed for $u$ to be changed to $v$. The following gives some of its basic properties.

# Proposition

$d(u\text{,}\text{}v)$ is a metric on ${A}^{n}$, which means that it satisfies

$d(u\text{,}\text{}v)\ge 0$, and $d(u\text{,}\text{}v)=0$ if and only if $u=v$

$d(u\text{,}\text{}v)=d(v\text{,}\text{}u)$ for all $u\text{,}\text{}v$

$d(u\text{,}\text{}v)\le d(u\text{,}\text{}w)+d(w\text{,}\text{}v)$ for all $u\text{,}\text{}v\text{,}\text{}w$.

The third property is often called the triangle inequality.

Proof. (1) $d(u\text{,}\text{}v)=0$ is exactly the same as saying that $u$ and $v$ differ in no places, which means that $u=v$. Part (2) is obvious. For part (3), observe that if $u$ and $v$ differ in a place, then either $u$ and $w$ differ at that place, or $v$ and $w$ differ at that place, or both. Therefore, the number of places where $u$ and $v$ differ is less than or equal to the number of places where $u$ and $w$ differ, plus the number of places where $v$ and $w$ differ.

For a code $C$, one can calculate the Hamming distance between any two distinct codewords. Out of this table of distances, there is a minimum value $d(C)$, which is called the **minimum distance** of $C$. In other words,

The minimum distance of $C$ is very important number, since it gives the smallest number of errors needed to change one codeword into another.

When a codeword is transmitted over a noisy channel, errors are introduced into some of the entries of the vector. We correct these errors by finding the codeword whose Hamming distance from the received vector is as small as possible. In other words, we change the received vector to a codeword by changing the fewest places possible. This is called **nearest neighbor decoding**.

We say that a code can **detect** up to $s$ errors if changing a codeword in at most $s$ places cannot change it to another codeword. The code can **correct** up to $t$ errors if, whenever changes are made at $t$ or fewer places in a codeword $c$, then the closest codeword is still $c$. This definition says nothing about an efficient algorithm for correcting the errors. It simply requires that nearest neighbor decoding gives the correct answer when there are at most $t$ errors. An important result from the theory of error correcting codes is the following.

# Theorem

1. A code $C$ can detect up to $s$ errors if $d(C)\ge s+1$. 2. A code $C$ can correct up to $t$ errors if $d(C)\ge 2t+1$.

Proof.

Suppose that $d(C)\ge s+1$. If a codeword $c$ is sent and $s$ or fewer errors occur, then the received message $r$ cannot be a different codeword. Hence, an error is detected.

Suppose that $d(C)\ge 2t+1$. Assume that the codeword $c$ is sent and the received word $r$ has $t$ or fewer errors; that is, $d(c\text{,}\text{}r)\le t$. If ${c}_{1}$ is any other codeword besides $c$, we claim that $d({c}_{1}\text{,}\text{}r)\ge t+1$. To see this, suppose that $d({c}_{1}\text{,}\text{}r)\le t$. Then, by applying the triangle inequality, we have

$$2t+1\le d(C)\le d(c\text{,}\text{}{c}_{1})\le d(c\text{,}\text{}r)+d({c}_{1}\text{,}\text{}r)\le t+t=2t\text{.}$$This is a contradiction, so $d({c}_{1}\text{,}\text{}r)\ge t+1$. Since $r$ has $t$ or fewer errors, nearest neighbor decoding successfully decodes $r$ to $c$.

How does one find the nearest neighbor? One way is to calculate the distance between the received message $r$ and each of the codewords, then select the codeword with the smallest Hamming distance. In practice, this is impractical for large codes. In general, the problem of decoding is challenging, and considerable research effort is devoted to looking for fast decoding algorithms. In later sections, we’ll discuss a few decoding techniques that have been developed for special classes of codes.

Before continuing, it is convenient to introduce some notation.

# Notation

A code of length $n$, with $M$ codewords, and with minimum distance $d=d(C)$, is called an $\mathbf{(}\mathit{n}\mathbf{,}\mathit{M}\mathbf{,}\mathit{d}\mathbf{)}$ **code**.

When we discuss linear codes, we’ll have a similar notation, namely, an $[n\text{,}\text{}k\text{,}\text{}d]$ code. Note that this latter notation uses square brackets, while the present one uses curved parentheses. (These two similar notations cause less confusion than one might expect!) The relation is that an $[n\text{,}\text{}k\text{,}\text{}d]$ binary linear code is an $(n\text{,}\text{}{2}^{k}\text{,}\text{}d)$ code.

The binary repetition code $\{(0\text{,}\text{}0\text{,}\text{}0)\text{,}\text{}(1\text{,}\text{}1\text{,}\text{}1)\}$ is a $(3\text{,}\text{}2\text{,}\text{}3)$ code. The Hadamard code of Exercise 6, Section 24.1, is a $(32\text{,}\text{}64\text{,}\text{}16)$ code (it could correct up to 7 errors because $16\ge 2\cdot 7+1$).

If we have a $q$-ary $(n\text{,}\text{}M\text{,}\text{}d)$ code, then we define the **code rate**, or **information rate**, $R$ by

For example, for the Hadamard code, $R={\mathrm{log}}_{2}\text{\hspace{0.17em}}(64)/32=6/32$. The code rate represents the ratio of the number of input data symbols to the number of transmitted code symbols. It is an important parameter to consider when implementing real-world systems, as it represents what fraction of the bandwidth is being used to transmit actual data. The code rate was mentioned in Examples 4 and 6 in Section 24.1. A few limitations on the code rate will be discussed in Section 24.3.

Given a code, it is possible to construct other codes that are essentially the same. Suppose that we have a codeword $c$ that is expressed as $c=({c}_{1}\text{,}\text{}{c}_{1}\text{,}\text{}\cdots \text{,}\text{}{c}_{n})$. Then we may define a positional permutation of $c$ by permuting the order of the entries of $c$. For example, the new vector ${c}^{\prime}=({c}_{2}\text{,}\text{}{c}_{3}\text{,}\text{}{c}_{1})$ is a positional permutation of $c=({c}_{1}\text{,}\text{}{c}_{2}\text{,}\text{}{c}_{3})$. Another type of operation that can be done is a symbol permutation. Suppose that we have a permutation of the $q$-ary symbols. Then we may fix a position and apply this permutation of symbols to that fixed position for every codeword. For example, suppose that we have the following permutation of the ternary symbols $\{0\to 2\text{,}\text{}1\to 0\text{,}\text{}2\to 1\}$, and that we have the following codewords: $(0\text{,}\text{}1\text{,}\text{}2)$, $(0\text{,}\text{}2\text{,}\text{}1)$, and $(2\text{,}\text{}0\text{,}\text{}1)$. Then applying the permutation to the second position of all of the codewords gives the following vectors: $(0\text{,}\text{}0\text{,}\text{}2)$, $(0\text{,}\text{}1\text{,}\text{}1)$, and $(2\text{,}\text{}2\text{,}\text{}1)$.

Formally, we say that two codes are **equivalent** if one code can be obtained from the other by a series of the following operations:

Permuting the positions of the code

Permuting the symbols appearing in a fixed position of all codewords

It is easy to see that all codes equivalent to an $(n\text{,}\text{}M\text{,}\text{}d)$ code are also $(n\text{,}\text{}M\text{,}\text{}d)$ codes. However, for certain choices of $n\text{,}\text{}M\text{,}\text{}d$, there can be several inequivalent $(n\text{,}\text{}M\text{,}\text{}d)$ codes.