11.5 SHA3/Keccak
In 2006, NIST announced a competition to produce a new hash function to serve alongside SHA2. The new function was required to be at least as secure as SHA2 and to have the same four output possibilities. Fiftyone entries were submitted, and in 2012, Keccak was announced as the winner. It was certified as a standard by NIST in 2012 in FIPS202. (The name is pronounced “ketchak”. It has been suggested that the name is related to “Kecak,” a type of Balinese dance. Perhaps the movement of the dancers is analogous to the movement of the bits during the algorithm.) This algorithm became the hash function SHA3.
The SHA3 algorithm was developed by Guido Bertoni, Joan Daemen, and Gilles Van Assche from STMicroelectronics and Michaël Peeters from NXP Semiconductors. It differs from the MerkleDamgård construction and is based on the theory of Sponge Functions. The idea is that the first part of the algorithm absorbs the message, and then the hash value is squeezed out. Here is how it works. The state of the machine is a string of $b$ bits, which is fed to a function $f$ that takes an input of $b$ bits and outputs a string of $b$ bits, thus producing a new state of the machine. In contrast to the compression functions in the MerkleDamgård construction, the function $f$ is a onetoone function. Such a function could not be used in the MerkleDamgård situation since the number of input bits (from ${M}_{i}$ and the previous step) is greater than the number of output bits. But the different construction in the present case allows it.
Parameters $r$ (“the rate”) and $c$ (“the capacity”) are chosen so that $r+c=b$. The message (written in binary) is padded so that its length is a multiple of $r$, then is broken into $n$ blocks of length $r$:
To start, the state is initialized to all 0s. The absorption stage is the first part of Figure 11.2.
After the absorption is finished, the hash value is squeezed out: $r$ bits are output and truncated to the $d$ bits that are used as the hash value. This is the second part of Figure 11.2.
Producing a SHA3 hash value requires only one squeeze. However, the algorithm can also be used with multiple squeezes to produce arbitrarily long pseudorandom bitstrings. When it is used this way, it is often called SHAKE (= Secure Hash Algorithm with Keccak).
The “length extension” and collisionbased attacks of Section 11.3 are less likely to succeed. Suppose two messages yield the same hash value. This means that when the absorption has finished and the squeezing stage is starting, there are $d$ bits of the state that agree with these bits for the other message. But there are at least $c$ bits that are not output, and there is no reason that these $c$ bits match. If, instead of starting the squeezing, you do another round of absorption, the differing $c$ bits will cause the subsequent states and the outputted hash values to differ. In other words, there are at least ${2}^{c}$ possible internal states for any given $d$bit output $H$.
SHA3 has four different versions, named SHA3224, SHA3256, SHA3384, and SHA3512. For SHA3$m$, the $m$ denotes the security level, measured in bits. For example, SHA3256 is expected to require around ${2}^{256}$ operations to find a collision or a preimage. Since ${2}^{256}\approx {10}^{77}$, this should be impossible well into the future. The parameters are taken to be
For SHA3256, these are
The same function $f$ is used in all versions, which means that it is easy to change from one security level to another. Note that there is a tradeoff between speed and security. If the security parameter $m$ is increased, then $r$ decreases, so the message is read slower, since it is read in blocks of $r$ bits.
In the following, we concentrate on SHA3256. The other versions are obtained by suitably varying the parameters. For more details, see [FIPS 202].
The Padding. We start with a message $M$. The message is read in blocks of $r=1088$ bits, so we want the message to have length that is a multiple of 1088. But first the message is padded to $M01$. This is for “domain separation.” There are other uses of the Keccak algorithm such as SHAKE (mentioned above), and for these other purposes, $M$ is padded differently, for example with 1111
. This initial padding makes it very likely that using $M$ in different situations yields different outputs. Next, “10*1
padding” is used. This means that first a 1 is appended to $M$011 to yield $M$011 . Then sufficiently many 01 s are appended to make the total length one less than a multiple of 1088. Finally, a 1 is appended. We can now divide the result into blocks of length 1088.
Why are these choices made for the padding? Why not simply append enough 0
s s to get the desired length? Suppose that ${M}_{1}=1010111$. Then ${M}_{1}10=101011110$. Now append 1079 zeros to get the block to be hashed. If ${M}_{2}=1010$ is being used in SHAKE, then ${M}_{2}1111$ is padded with 1080 zeros to yield the same block. This means that the outputs for ${M}_{1}$ and ${M}_{2}$ are equal. The padding is designed to avoid all such situations.
Absorption and Squeezing. From now on, we assume that the padding has been done and we have $N$ blocks of length $1088$:
The absorption now proceeds as in Figure 11.2 (we describe the function $f$ later).
The initial state $S$ is a string of 0s s of length 1600.
For $j=0$ to $N1$, let $S=f(S\oplus ({M}_{j}{0}^{c}))$, where ${0}^{c}$ denotes a string of 0s of length $c=512$. What this does is XOR the message block ${M}_{j}$ with the first $r=1088$ bits of $S$, and then apply the function $f$. This yields an updated state $S$, which is modified during each iteration of the index $j$.
Return $S$.
The squeezing now proceeds as in Figure 11.2:
Input $S$ and let $Z$ be the empty string.
While $\text{Length}(\text{Z})<\text{td}$ (where $d=256$ is the output size)
Let $Z=ZTrun{c}_{r}(S)$, where ${\text{Trunc}}_{r}(\text{S})$ denotes the first $r=1088$ bits of $S$.
$S=f(S)$.
Return $Trun{c}_{d}(Z)$.
The bitstring ${\text{Trunc}}_{256}(Z)$ is the 256bit hash value SHA$3(M)$. For the hash value, we need only one squeeze to obtain $Z$. But the algorithm could also be used to produce a much longer pseudorandom bitstring, in which case several squeezes might be needed.
The function $\mathit{f}$. The main component of the algorithm is the function $f$, which we now describe. The input to $f$ is the 1600bit state of the machine
where each $S[j]$ is a bit. It’s easiest to think of these bits as forming a threedimensional $5\times 5\times 64$ array $A[x\text{,}\text{}y\text{,}\text{}z]$ with coordinates $(x\text{,}\text{}y\text{,}\text{}z)$ satisfying
A “column” consists of the five bits with fixed $x\text{,}\text{}z$. A “row” consists of the five bits with fixed $y\text{,}\text{}z$. A “lane” consists of the 64 bits with fixed $x\text{,}\text{}y$.
When we write “for all $x\text{,}\text{}z$” we mean for $0\le x\le 4$ and $0\le z\le 63$, and similarly for other combinations of $x\text{,}\text{}y\text{,}\text{}z$.
The correspondence between $S$ and $A$ is given by
for all $x\text{,}\text{}y\text{,}\text{}z$. For example, $A[1\text{,}\text{}2\text{,}\text{}3]=S[707]$. The ordering of the indices could be described as “lexicographic” order using $y\text{,}\text{}x\text{,}\text{}z$ (not $x\text{,}\text{}y\text{,}\text{}z$), since the index of $S$ corresponding to ${x}_{1}\text{,}\text{}{y}_{1}\text{,}\text{}{z}_{1}$ is smaller than the index for ${x}_{2}\text{,}\text{}{y}_{2}\text{,}\text{}{z}_{2}$ if ${y}_{1}{x}_{1}{z}_{1}$ precedes ${y}_{2}{x}_{2}{z}_{2}$ in “alphabetic order.”
The coordinates $x\text{,}\text{}y$ are taken to be numbers mod 5, and $z$ is mod 64. For example $A[7\text{,}\text{}1\text{,}\text{}86]$ is taken to be $A[2\text{,}\text{}4\text{,}\text{}22]$, since $7\equiv 2$ mod 5, $1\equiv 4$ mod 5, and $86\equiv 22$ mod 64.
The computation of $f$ proceeds in several steps. The steps receive the array $A$ as input and they output a modified array ${A}^{\prime}$ to replace $A$.
The following steps $I$ through $V$ are repeated for $i=0$ to $i=23$:

The first step XORs the bits in a column with the parities of two nearby columns.
For all $x\text{,}\text{}z$, let
$C[x\text{,}\text{}z]=A[x\text{,}\text{}0\text{,}\text{}z]\oplus A[x\text{,}\text{}1\text{,}\text{}z]\oplus A[x\text{,}\text{}2\text{,}\text{}z]\oplus A[x\text{,}\text{}3\text{,}\text{}z]\oplus A[x\text{,}\text{}4\text{,}\text{}z]\text{.}$
This gives the “parity” of the bitstring formed by the five bits in the $x\text{,}\text{}z$ column.
For all $x\text{,}\text{}z$, let $D[x\text{,}\text{}z]=C[x1\text{,}\text{}z]\oplus C[x+1\text{,}\text{}z1]$.
For all $x\text{,}\text{}y\text{,}\text{}z$, let ${A}^{\prime}[x\text{,}\text{}y\text{,}\text{}z]=A[x\text{,}\text{}y\text{,}\text{}z]\oplus D[x\text{,}\text{}z]$.
The second step rotates the 64 bits in each lane by an amount depending on the lane:
For all $z$, let ${A}^{\prime}[0\text{,}\text{}0\text{,}\text{}z]=A[0\text{,}\text{}0\text{,}\text{}z]$.
Let $(x\text{,}\text{}y)=(1\text{,}\text{}0)$.
For $t=0$ to $23$
For all $z$, let ${A}^{\prime}[x\text{,}\text{}y\text{,}\text{}z]=A[x\text{,}\text{}y\text{,}\text{}z(t+1)(t+2)/2]$.
Let $(x\text{,}\text{}y)=(y\text{,}\text{}2x+3y)$
Return ${A}^{\prime}$.
For example, consider the bits with coordinates of the form $(1\text{,}\text{}0\text{,}\text{}z)$. They are handled by the case $t=0$ and we have ${A}^{\prime}[1\text{,}\text{}0\text{,}\text{}z]=A[1\text{,}\text{}0\text{,}\text{}z1]$, so the bits in this lane are rotated by one position. Then, in Step 3(b), $(x\text{,}\text{}y)=(1\text{,}\text{}0)$ is changed to $(0\text{,}\text{}2)$ for the iteration with $t=1$. We have ${A}^{\prime}[0\text{,}\text{}2\text{,}\text{}z]=A[0\text{,}\text{}2\text{,}\text{}z3]$, so this lane is rotated by three positions. Then $(x\text{,}\text{}y)$ is changed to $(2\text{,}\text{}6)$, which is reduced mod 5 to $(2\text{,}\text{}1)$, and we pass to $t=2$, which gives a rotation by six (the rotations are by what are known as “triangular numbers”). After $t=23$, all of the lanes have been rotated.
The third step rearranges the positions of the lanes:
For all $x\text{,}\text{}y\text{,}\text{}z$, let ${A}^{\prime}[x\text{,}\text{}y\text{,}\text{}z]=A[x+3y\text{,}\text{}x\text{,}\text{}z]$.
Again, the coordinate $x+3y$ should be reduced mod 5.
The next step is the only nonlinear step in the algorithm. It XORs each bit with an expression formed from two other bits in its row.
For all $x\text{,}\text{}y\text{,}\text{}z$, let
${A}^{\prime}[x\text{,}\text{}y\text{,}\text{}z]=A[x\text{,}\text{}y\text{,}\text{}z]\oplus \left(A[x+1\text{,}\text{}y\text{,}\text{}z]\oplus 1\right)\left(A[x+2\text{,}\text{}y\text{,}\text{}z]\right)\text{.}$
The multiplication is multiplying two binary bits, hence is the same as the AND operator.

Finally, some bits in the $(0\text{,}\text{}0)$ lane are modified.
For all $x\text{,}\text{}y\text{,}\text{}z$, let ${A}^{\prime}[x\text{,}\text{}y\text{,}\text{}z]=A[x\text{,}\text{}y\text{,}\text{}z]$.
Set $RC={0}^{24}$.
For $j=0$ to 6, let $RC[{2}^{j}1]=rc(j+7i)$, where $rc$ is an auxiliary function defined below.
For all $z$, let ${A}^{\prime}[0\text{,}\text{}0\text{,}\text{}z]={A}^{\prime}[0\text{,}\text{}0\text{,}\text{}z]\oplus RC[z]$.
Return ${A}^{\prime}$.
After I through V are completed for one value of $i$, the next value of $i$ is used and I through V are repeated for the new $i$, through $i=23$. The final output is the new array $A$, which yields a new bitstring $S$ of length $1600$.
This completes the description of the function $f$, except that we still need to describe the auxiliary function $rc$.
The function $rc$ takes an integer $t$ mod 255 as input and outputs a bit according to the following algorithm:
If $t\equiv 0$ mod 255, return 1 . Else
$R=$10000000
For $k=1$ to $t$ mod 255
Let $R=0R$
Let $R[0]=R[0]\oplus R[8]$
Let $R[4]=R[4]\oplus R[8]$
Let $R[5]=R[5]\oplus R[8]$
Let $R[6]=R[6]\oplus R[8]$
Let $R={\text{Trunc}}_{8}(R)$.
Return $R[0]$.
The bit that is outputted is $rc(t)$.