Until recently, most hash functions used a form of the Merkle-Damgård construction. It was invented independently by Ralph Merkle in 1979 and Ivan Damgård in 1989. The main ingredient is a function , usually called a compression function. It takes two bitstrings as inputs, call them and , and outputs a bitstring of the same length as . For example, could have length 512 and could have length . These are the sizes that the hash function SHA-256 uses, and we’ll use them for concreteness. The message that is to be hashed is suitably padded so that its length is a multiple of 512, and then broken into blocks of length 512:
An initial value is set. Then the blocks are fed one-by-one into and the final output is the hash value:
This construction is very natural: The blocks are read from the message one at a time and stirred into the mix with the previous blocks. The final result is the hash value.
Over the years, some disadvantages of the method have been discovered. One is called the length extension attack. For example, suppose Alice wants to ensure that her message to Bob has not been tampered with. They both have a secret key , so Alice prepends with to get . She sends both and to Bob. Since Bob also knows , he computes and checks that it agrees with the hash value sent by Alice. If so, Bob concludes that the message is authentic.
Since Eve does not know , she cannot send her own message along with . But, because of the iterative form of the hash function, Eve can append blocks to if she intercepts Alice’s communication. Then Eve sends
to Bob. Since she knows , she can produce a message that Bob will regard as authentic. Of course, this attack can be thwarted by using instead of , but it points to a weakness in the construction.
However, using might also cause problems if Eve discovers a way of producing collisions for . Namely, Eve finds and with . If Eve can arrange this so that is a good message and is a bad message (see Section 12.1), then Eve arranges for Alice to authenticate by computing , which equals . This means that Alice has also authenticated .
Another attack was given by Daum and Luks in 2005. Suppose Alice is using a high-level document language such as PostScript, which is really a program rather than just a text file. The file begins with a preamble that identifies the file as PostScript and gives some instructions. Then the content of the file follows.
Suppose Eve is able to find random strings and such that
where instructs the PostScript program to put the string in a certain register. In other words, we are assuming that Eve has found a collision of this form. If any string is appended to these messages, there is still a collision
because of the iterative nature of the hash algorithm (we are ignoring the effects of padding).
Of course, Eve has an evil document , perhaps saying that Alice (who is a bank president) gives Eve access to the bank’s vault. Eve also produces a document that Alice will be willing to sign, for example, a petition to give bank presidents tax breaks. Eve then produces two messages:
For example, puts into a stack, then puts in . They are not equal, so is produced. Eve now has two Postscript files, and , with . As we’ll see in Chapter 13, it is standard for Alice to sign the hash of a message rather than the message itself. Eve shows to Alice, who compiles it. The output is the petition that Alice is happy to sign. So Alice signs . But this means Alice has also signed . Eve takes to the bank, along with Alice’s signature on its hash value. The security officer at the bank checks that the signature is valid, then opens the document, which says that Alice grants Eve access to the bank’s vault. This potentially costly forgery relies on Eve being able to find a collision, but again it shows a weakness in the construction if there is a possibility of finding collisions.