# 11.4 SHA-2

In this section and the next, we look at what is involved in making a real cryptographic hash function. Unlike block ciphers, where there are many block ciphers to choose from, there are only a few hash functions that are used in practice. The most notable of these are the Secure Hash Algorithm (SHA) family, the Message Digest (MD) family, and the RIPEMD-160 message digest algorithm. The original MD algorithm was never published, and the first MD algorithm to be published was MD2, followed by MD4 and MD5. Weaknesses in MD2 and MD4 were found, and MD5 was proposed by Ron Rivest as an improvement upon MD4. Collisions have been found for MD5, and the strength of MD5 is now less certain.

The Secure Hash Algorithm was developed by the National Security Agency (NSA) and given to the National Institute of Standards and Technology (NIST). The original version, often referred to as SHA or SHA-0, was published in 1993 as a Federal Information Processing Standard (FIPS 180). SHA contained a weakness that was later uncovered by the NSA, which led to a revised standards document (FIPS 180-1) that was released in 1995. This revised document describes the improved version, SHA-1, which for several years was the hash algorithm recommended by NIST. However, weaknesses started to appear and in 2017, a collision was found (see the discussion in Section 11.1). SHA-1 is now being replaced by a series of more secure versions called SHA-2. They still use the Merkle-Damgård construction. In the next section, we’ll meet SHA-3, which uses a different construction.

The reader is warned that the discussion that follows is fairly technical and is provided in order to give the flavor of what happens inside a hash function.

The SHA-2 family consists of six algorithms: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. The last three digits indicate the number of bits in the output. We’ll describe SHA-256. The other five are very similar.

SHA-256 produces a 256-bit hash and is built upon the same design principles as MD4, MD5, and SHA-1. These hash functions use an iterative procedure. Just as we did earlier, the original message $M$ is broken into a set of fixed-size blocks, $M={M}^{(1)}||{M}^{(2)}||\cdots ||{M}^{(N)}$, where the last block is padded to fill out the block. The message blocks are then processed via a sequence of rounds that use a compression function ${h}^{\prime}$ that combines the current block and the result from the previous round. That is, we start with an initial value ${X}_{0}$, and define ${X}_{j}={h}^{\prime}({X}_{j-1}\text{,}\text{}{M}^{(j)})$. The final ${X}_{N}$ is the message digest.

The trick behind building a hash function is to devise a good compression function. This compression function should be built in such a way as to make each input bit affect as many output bits as possible. One main difference between the SHA family and the MD family is that for SHA the input bits are used more often during the course of the hash function than they are for MD4 and MD5. This more conservative approach makes the design of SHA-1 and SHA-2 more secure than either MD4 or MD5, but also makes it a little slower.

In the description of the hash algorithm, we need the following operations on strings of 32 bits:

$X\wedge Y=$ bitwise “and”, which is bitwise multiplication mod 2, or bitwise minimum.

$X\vee Y=$ bitwise “or”, which is bitwise maximum.

$X\oplus Y=$ bitwise addition mod 2.

$\mathrm{\neg}X$ changes 1s to 0s and 0s to 1s .

$X+Y=$ addition of $X$ and $Y$ mod ${2}^{32}$, where $X$ and $Y$ are regarded as integers mod ${2}^{32}$.

${R}^{n}(X)=$ rotation of $X$ to the right by $n$ positions (the end wraps around to the beginning).

${S}^{n}(X)=$ shift of $X$ to the right by $n$ positions, with the first $n$ bits becoming 0s (so the bits at the end disappear and do not wrap around).

We also need the following functions that operate on 32-bit strings:

Define initial hash values ${H}_{1}^{(0)}\text{,}\text{}{H}_{2}^{(0)}\text{,}\text{}\dots \text{,}\text{}{H}_{8}^{(0)}$ as follows:

The preceding are written in **hexadecimal notation**. Each digit or letter represents a string of four bits:

For example, BA1 equals $11\ast {16}^{2}+10\ast {16}^{1}+1=2977$.

These initial hash values are obtained by using the first eight digits of the fractional parts of the square roots of the first eight primes, expressed as “decimals” in base 16. See Exercise 7.

We also need sixty-four 32-bit words

They are the first eight hexadecimal digits of the fractional parts of the cube roots of the first 64 primes.

# Padding and Preprocessing

SHA-256 begins by taking the original message and padding it with the bit $1$ followed by a sequence of $0$ bits. Enough $0$ bits are appended to make the new message $64$ bits short of the next highest multiple of $512$ bits in length. Following the appending of $1$ and $0$s, we append the $64$-bit representation of the length $T$ of the message. (This restricts the messages to length less than ${2}^{64}\approx {10}^{19}$ bits, which is not a problem in practice.)

For example, if the original message has 2800 bits, we add a 1 and 207 0s to obtain a new message of length $3008=6\times 512-64$. Since $2800={101011110000}_{2}$ in binary, we append fifty-two 0s followed by 101011110000 to obtain a message of length 3072. This is broken into six blocks of length 512.

Break the message with padding into $N$ blocks of length 512:

The hash algorithm inputs these blocks one by one. In the algorithm, each 512-bit block ${M}^{(i)}$ is divided into sixteen 32-bit blocks:

# The Algorithm

There are eight 32-bit registers, labeled $a\text{,}\text{}b\text{,}\text{}c\text{,}\text{}d\text{,}\text{}e\text{,}\text{}f\text{,}\text{}g\text{,}\text{}h$. These contain the intermediate hash values. The algorithm inputs a block of 512 bits from the message in Step 11, and in Steps 12 through 24, it stirs the bits of this block into a mix with the bits from the current intermediate hash values. After 64 iterations of this stirring, the algorithm produces an output that is added (mod ${2}^{32}$) onto the previous intermediate hash values to yield the new hash values. After all of the blocks of the message have been processed, the final intermediate hash values give the hash of the message.

The basic building block of the algorithm is the set of operations that take place on the subregisters in Steps 15 through 24. They take the subregisters and operate on them using rotations, XORs, and other similar operations.

For more details on hash functions, and for some of the theory involved in their construction, see [Stinson], [Schneier], and [Menezes et al.].