We now describe the steps in more detail. The 128 input bits are grouped into 16 bytes of eight bits each, call them
These are arranged into a matrix
In the following, we’ll occasionally need to work with the finite field . This is covered in Section 3.11. However, for the present purposes, we only need the following facts. The elements of are bytes, which consist of eight bits. They can be added by XOR. They can also be multiplied in a certain way (i.e., the product of two bytes is again a byte), but this process is more complicated. Each byte except the zero byte has a multiplicative inverse; that is, there is a byte such that . Since we can do arithmetic operations on bytes, we can work with matrices whose entries are bytes.
As a technical point, we note that the model of depends on a choice of irreducible polynomial of degree 8. The choice for Rijndael is . This is also the polynomial used in the examples in Section 3.11. Other choices for this polynomial would presumably give equally good algorithms.
In this step, each of the bytes in the matrix is changed to another byte by Table 8.1, called the S-box.
Write a byte as eight bits: abcdefgh. Look for the entry in the abcd row and efgh column (the rows and columns are numbered from 0 to 15). This entry, when converted to binary, is the output. For example, if the input byte is 10001011, we look in row 8 (the ninth row) and column 11 (the twelfth column). The entry is 61, which is 111101 in binary. This is the output of the S-box.
The output of SubBytes is again a matrix of bytes, let’s call it
The four rows of the matrix are shifted cyclically to the left by offsets of 0, 1, 2, and 3, to obtain
Regard a byte as an element of , as in Section 3.11. Then the output of the ShiftRows step is a matrix with entries in . Multiply this by a matrix, again with entries in , to produce the output , as follows:
The round key, derived from the key in a way we’ll describe later, consists of 128 bits, which are arranged in a matrix consisting of bytes. This is XORed with the output of the MixColumns step:
This is the final output of the round.
The original key consists of 128 bits, which are arranged into a matrix of bytes. This matrix is expanded by adjoining 40 more columns, as follows. Label the first four columns . The new columns are generated recursively. Suppose columns up through have been defined. If is not a multiple of 4, then
If is a multiple of 4, then
where is the transformation of obtained as follows. Let the elements of the column be . Shift these cyclically to obtain . Now replace each of these bytes with the corresponding element in the S-box from the SubBytes step, to get 4 bytes . Finally, compute the round constant
in (recall that we are in the case where is a multiple of 4). Then is the column vector
In this way, columns are generated from the initial four columns.
The round key for the th round consists of the columns
Although the S-box is implemented as a lookup table, it has a simple mathematical description. Start with a byte , where each is a binary bit. Compute its inverse in , as in Section 3.11. If the byte is 00000000, there is no inverse, so we use 00000000 in place of its inverse. The resulting byte represents an eight-dimensional column vector, with the rightmost bit in the top position. Multiply by a matrix and add the column vector to obtain a vector as follows:
The byte is the entry in the S-box.
For example, start with the byte . Its inverse in is , as we calculated in Section 3.11. We now calculate
This yields the byte 00011111. The first four bits 1100 represent 12 in binary and the last four bits 1011 represent 11 in binary. Add 1 to each of these numbers (since the first row and column are numbered 0) and look in the 13th row and 12th column of the S-box. The entry is 31, which in binary is 00011111.
Some of the considerations in the design of the S-box were the following. The map was used to achieve nonlinearity. However, the simplicity of this map could possibly allow certain attacks, so it was combined with multiplication by the matrix and adding the vector, as described previously. The matrix was chosen mostly because of its simple form (note how the rows are shifts of each other). The vector was chosen so that no input ever equals its S-box output or the complement of its S-box output (complementation means changing each 1 to 0 and each 0 to 1).