11.5 SHA-3/Keccak – Introduction to Cryptography with Coding Theory, 3rd Edition

11.5 SHA-3/Keccak

In 2006, NIST announced a competition to produce a new hash function to serve alongside SHA-2. The new function was required to be at least as secure as SHA-2 and to have the same four output possibilities. Fifty-one entries were submitted, and in 2012, Keccak was announced as the winner. It was certified as a standard by NIST in 2012 in FIPS-202. (The name is pronounced “ketchak”. It has been suggested that the name is related to “Kecak,” a type of Balinese dance. Perhaps the movement of the dancers is analogous to the movement of the bits during the algorithm.) This algorithm became the hash function SHA-3.

The SHA-3 algorithm was developed by Guido Bertoni, Joan Daemen, and Gilles Van Assche from STMicroelectronics and Michaël Peeters from NXP Semiconductors. It differs from the Merkle-Damgård construction and is based on the theory of Sponge Functions. The idea is that the first part of the algorithm absorbs the message, and then the hash value is squeezed out. Here is how it works. The state of the machine is a string of b bits, which is fed to a function f that takes an input of b bits and outputs a string of b bits, thus producing a new state of the machine. In contrast to the compression functions in the Merkle-Damgård construction, the function f is a one-to-one function. Such a function could not be used in the Merkle-Damgård situation since the number of input bits (from Mi and the previous step) is greater than the number of output bits. But the different construction in the present case allows it.

Parameters r (“the rate”) and c (“the capacity”) are chosen so that r+c=b. The message (written in binary) is padded so that its length is a multiple of r, then is broken into n blocks of length r:

M=M0||M1||M2||||Mn1.

To start, the state is initialized to all 0s. The absorption stage is the first part of Figure 11.2.

Figure 11.2 A Sponge Function

After the absorption is finished, the hash value is squeezed out: r bits are output and truncated to the d bits that are used as the hash value. This is the second part of Figure 11.2.

Producing a SHA-3 hash value requires only one squeeze. However, the algorithm can also be used with multiple squeezes to produce arbitrarily long pseudorandom bitstrings. When it is used this way, it is often called SHAKE (= Secure Hash Algorithm with Keccak).

The “length extension” and collision-based attacks of Section 11.3 are less likely to succeed. Suppose two messages yield the same hash value. This means that when the absorption has finished and the squeezing stage is starting, there are d bits of the state that agree with these bits for the other message. But there are at least c bits that are not output, and there is no reason that these c bits match. If, instead of starting the squeezing, you do another round of absorption, the differing c bits will cause the subsequent states and the outputted hash values to differ. In other words, there are at least 2c possible internal states for any given d-bit output H.

SHA-3 has four different versions, named SHA3-224, SHA3-256, SHA3-384, and SHA3-512. For SHA3-m, the m denotes the security level, measured in bits. For example, SHA3-256 is expected to require around 2256 operations to find a collision or a preimage. Since 22561077, this should be impossible well into the future. The parameters are taken to be

b=1600, d=m, c=2m, r=1600c.

For SHA3-256, these are

b=1600, d=256, c=512, r=1088.

The same function f is used in all versions, which means that it is easy to change from one security level to another. Note that there is a trade-off between speed and security. If the security parameter m is increased, then r decreases, so the message is read slower, since it is read in blocks of r bits.

In the following, we concentrate on SHA3-256. The other versions are obtained by suitably varying the parameters. For more details, see [FIPS 202].

The Padding. We start with a message M. The message is read in blocks of r=1088 bits, so we want the message to have length that is a multiple of 1088. But first the message is padded to M||01. This is for “domain separation.” There are other uses of the Keccak algorithm such as SHAKE (mentioned above), and for these other purposes, M is padded differently, for example with 1111. This initial padding makes it very likely that using M in different situations yields different outputs. Next, “10*1 padding” is used. This means that first a 1 is appended to M011 to yield M||011 . Then sufficiently many 01 s are appended to make the total length one less than a multiple of 1088. Finally, a 1 is appended. We can now divide the result into blocks of length 1088.

Why are these choices made for the padding? Why not simply append enough 0s s to get the desired length? Suppose that M1=1010111. Then M1||10=101011110. Now append 1079 zeros to get the block to be hashed. If M2=1010 is being used in SHAKE, then M2||1111 is padded with 1080 zeros to yield the same block. This means that the outputs for M1 and M2 are equal. The padding is designed to avoid all such situations.

Absorption and Squeezing. From now on, we assume that the padding has been done and we have N blocks of length 1088:

M0||M1||||MN1.

The absorption now proceeds as in Figure 11.2 (we describe the function f later).

  1. The initial state S is a string of 0s s of length 1600.

  2. For j=0 to N1, let S=f(S(Mj||0c)), where 0c denotes a string of 0s of length c=512. What this does is XOR the message block Mj with the first r=1088 bits of S, and then apply the function f. This yields an updated state S, which is modified during each iteration of the index j.

  3. Return S.

The squeezing now proceeds as in Figure 11.2:

  1. Input S and let Z be the empty string.

  2. While Length(Z)<td (where d=256 is the output size)

    1. Let Z=Z||Truncr(S), where Truncr(S) denotes the first r=1088 bits of S.

    2. S=f(S).

  3. Return Truncd(Z).

The bitstring Trunc256(Z) is the 256-bit hash value SHA-3(M). For the hash value, we need only one squeeze to obtain Z. But the algorithm could also be used to produce a much longer pseudorandom bitstring, in which case several squeezes might be needed.

The function f. The main component of the algorithm is the function f, which we now describe. The input to f is the 1600-bit state of the machine

S=S[0]||S[1]||S[2]||||S[1599], 

where each S[j] is a bit. It’s easiest to think of these bits as forming a three-dimensional 5×5×64 array A[x, y, z] with coordinates (x, y, z) satisfying

0x4, 0y4, 0z63.

A “column” consists of the five bits with fixed x, z. A “row” consists of the five bits with fixed y, z. A “lane” consists of the 64 bits with fixed x, y.

When we write “for all x, z” we mean for 0x4 and 0z63, and similarly for other combinations of x, y, z.

The correspondence between S and A is given by

A[x, y, z]=S[64(5y+x)+z]

for all x, y, z. For example, A[1, 2, 3]=S[707]. The ordering of the indices could be described as “lexicographic” order using y, x, z (not x, y, z), since the index of S corresponding to x1, y1, z1 is smaller than the index for x2, y2, z2 if y1x1z1 precedes y2x2z2 in “alphabetic order.”

The coordinates x, y are taken to be numbers mod 5, and z is mod 64. For example A[7, 1, 86] is taken to be A[2, 4, 22], since 72 mod 5, 14 mod 5, and 8622 mod 64.

The computation of f proceeds in several steps. The steps receive the array A as input and they output a modified array A to replace A.

The following steps I through V are repeated for i=0 to i=23:

  1. The first step XORs the bits in a column with the parities of two nearby columns.

    1. For all x, z, let

      C[x, z]=A[x, 0, z]A[x, 1, z]A[x, 2, z]A[x, 3, z]A[x, 4, z].

      This gives the “parity” of the bitstring formed by the five bits in the x, z column.

    2. For all x, z, let D[x, z]=C[x1, z]C[x+1, z1].

    3. For all x, y, z, let A[x, y, z]=A[x, y, z]D[x, z].

  2. The second step rotates the 64 bits in each lane by an amount depending on the lane:

    1. For all z, let A[0, 0, z]=A[0, 0, z].

    2. Let (x, y)=(1, 0).

    3. For t=0 to 23

      1. For all z, let A[x, y, z]=A[x, y, z(t+1)(t+2)/2].

      2. Let (x, y)=(y, 2x+3y)

    4. Return A.

    For example, consider the bits with coordinates of the form (1, 0, z). They are handled by the case t=0 and we have A[1, 0, z]=A[1, 0, z1], so the bits in this lane are rotated by one position. Then, in Step 3(b), (x, y)=(1, 0) is changed to (0, 2) for the iteration with t=1. We have A[0, 2, z]=A[0, 2, z3], so this lane is rotated by three positions. Then (x, y) is changed to (2, 6), which is reduced mod 5 to (2, 1), and we pass to t=2, which gives a rotation by six (the rotations are by what are known as “triangular numbers”). After t=23, all of the lanes have been rotated.

  3. The third step rearranges the positions of the lanes:

    1. For all x, y, z, let A[x, y, z]=A[x+3y, x, z].

    Again, the coordinate x+3y should be reduced mod 5.

  4. The next step is the only nonlinear step in the algorithm. It XORs each bit with an expression formed from two other bits in its row.

    1. For all x, y, z, let

    2. A[x, y, z]=A[x, y, z]A[x+1, y, z]1A[x+2, y, z].

    3. The multiplication is multiplying two binary bits, hence is the same as the AND operator.

  5. Finally, some bits in the (0, 0) lane are modified.

    1. For all x, y, z, let A[x, y, z]=A[x, y, z].

    2. Set RC=024.

    3. For j=0 to 6, let RC[2j1]=rc(j+7i), where rc is an auxiliary function defined below.

    4. For all z, let A[0, 0, z]=A[0, 0, z]RC[z].

    5. Return A.

After I through V are completed for one value of i, the next value of i is used and I through V are repeated for the new i, through i=23. The final output is the new array A, which yields a new bitstring S of length 1600.

This completes the description of the function f, except that we still need to describe the auxiliary function rc.

The function rc takes an integer t mod 255 as input and outputs a bit according to the following algorithm:

  1. If t0 mod 255, return 1 . Else

  2. R=10000000

  3. For k=1 to t mod 255

    1. Let R=0||R

    2. Let R[0]=R[0]R[8]

    3. Let R[4]=R[4]R[8]

    4. Let R[5]=R[5]R[8]

    5. Let R[6]=R[6]R[8]

    6. Let R=Trunc8(R).

  4. Return R[0].

The bit that is outputted is rc(t).