11.3 The Merkle-Damgård ConstructionThe Merkle-Damgård Construction – Introduction to Cryptography with Coding Theory, 3rd Edition

11.3 The Merkle-Damgård Construction

Until recently, most hash functions used a form of the Merkle-Damgård construction. It was invented independently by Ralph Merkle in 1979 and Ivan Damgård in 1989. The main ingredient is a function f, usually called a compression function. It takes two bitstrings as inputs, call them H and M, and outputs a bitstring H=f(H, M) of the same length as H. For example, M could have length 512 and H could have length 256. These are the sizes that the hash function SHA-256 uses, and we’ll use them for concreteness. The message that is to be hashed is suitably padded so that its length is a multiple of 512, and then broken into n blocks of length 512:


An initial value IV is set. Then the blocks are fed one-by-one into f and the final output is the hash value:

This construction is very natural: The blocks are read from the message one at a time and stirred into the mix with the previous blocks. The final result is the hash value.

Over the years, some disadvantages of the method have been discovered. One is called the length extension attack. For example, suppose Alice wants to ensure that her message M to Bob has not been tampered with. They both have a secret key K, so Alice prepends M with K to get K||M. She sends both M and H(K||M) to Bob. Since Bob also knows K, he computes H(K||M) and checks that it agrees with the hash value sent by Alice. If so, Bob concludes that the message is authentic.

Since Eve does not know K, she cannot send her own message M along with H(K||M). But, because of the iterative form of the hash function, Eve can append blocks M′′ to M if she intercepts Alice’s communication. Then Eve sends

M||M′′ and H(K||M||M′′)=f(H(K||M), M′′)

to Bob. Since she knows H(K||M), she can produce a message that Bob will regard as authentic. Of course, this attack can be thwarted by using M||K instead of K||M, but it points to a weakness in the construction.

However, using H(M||K) might also cause problems if Eve discovers a way of producing collisions for H. Namely, Eve finds M1 and M2 with H(M1)=H(M2). If Eve can arrange this so that M1 is a good message and M2 is a bad message (see Section 12.1), then Eve arranges for Alice to authenticate M1 by computing H(M1||K), which equals H(M2||K). This means that Alice has also authenticated M2.

Another attack was given by Daum and Luks in 2005. Suppose Alice is using a high-level document language such as PostScript, which is really a program rather than just a text file. The file begins with a preamble that identifies the file as PostScript and gives some instructions. Then the content of the file follows.

Suppose Eve is able to find random strings R1 and R2 such that

Hpreamble; put(R1)=Hpreamble; put(R2), 

where put(Ri) instructs the PostScript program to put the string Ri in a certain register. In other words, we are assuming that Eve has found a collision of this form. If any string S is appended to these messages, there is still a collision

Hpreamble; put(R1)||S=Hpreamble; put(R2)||S

because of the iterative nature of the hash algorithm (we are ignoring the effects of padding).

Of course, Eve has an evil document T1, perhaps saying that Alice (who is a bank president) gives Eve access to the bank’s vault. Eve also produces a document T2 that Alice will be willing to sign, for example, a petition to give bank presidents tax breaks. Eve then produces two messages:

Y1=preamble; put(R1);put(R1);if (=) then T1 else T2Y2=preamble; put(R2);put(R1);if (=) then T1 else T2.

For example, Y2 puts R1 into a stack, then puts in R2. They are not equal, so T2 is produced. Eve now has two Postscript files, Y1 and Y2, with H(Y1)=H(Y2). As we’ll see in Chapter 13, it is standard for Alice to sign the hash of a message rather than the message itself. Eve shows Y2 to Alice, who compiles it. The output is the petition that Alice is happy to sign. So Alice signs H(Y2). But this means Alice has also signed H(Y1). Eve takes Y1 to the bank, along with Alice’s signature on its hash value. The security officer at the bank checks that the signature is valid, then opens the document, which says that Alice grants Eve access to the bank’s vault. This potentially costly forgery relies on Eve being able to find a collision, but again it shows a weakness in the construction if there is a possibility of finding collisions.