# 12.4 Using Hash Functions to Encrypt

Cryptographic hash functions are some of the most widely used cryptographic tools, perhaps second only to block ciphers. They find applications in many different areas of information security. Later, in Chapter 13, we shall see an application of hash functions to digital signatures, where the fact that they shrink the representation of data makes the operation of creating a digital signature more efficient. We now look at how they may be used to serve the role of a cipher by providing data confidentiality.

A cryptographic hash function takes an input of arbitrary length and provides a fixed-size output that appears random. In particular, if we have two distinct inputs, then their hashes should be different. Generally, their hashes are very different. This is a property that hash functions share with good ciphers and is a property that allows us to use a hash function to perform encryption.

Using a hash function to perform encryption is very similar to a stream cipher in which the output of a pseudorandom number generator is XORed with the plaintext. We saw such an example when we studied the output feedback mode (OFB) of a block cipher. Much like the block cipher did for OFB, the hash function creates a pseudorandom bit stream that is XORed with the plaintext to create a ciphertext.

In order to make a cryptographic hash function operate as a stream cipher, we need two components: a key shared between Alice and Bob, and an initialization vector. We shall soon address the issue of the initialization vector, but for now let us begin by assuming that Alice and Bob have established a shared secret key ${K}_{AB}$.

Now, Alice could create a pseudorandom byte ${x}_{1}$ by taking the leftmost byte of the hash of ${K}_{AB}\text{;}$ that is, ${x}_{1}={L}_{8}\left(h({K}_{AB})\right)$. She could then encrypt a byte of plaintext ${p}_{1}$ by XORing with the random byte ${x}_{1}$ to produce a byte of ciphertext

But if she has more than one byte of plaintext, then how should continue? We use feedback, much like we did in OFB mode. The next pseudorandom byte should be created by ${x}_{2}={L}_{8}\left(h({K}_{AB}\parallel {x}_{1})\right)$. Then the next ciphertext byte can be created by

In general, the pseudorandom byte ${x}_{j}$ is created by ${x}_{j}={L}_{8}\left(h({K}_{AB}\parallel {x}_{j-1})\right)$, and encryption is simply XORing ${x}_{j}$ with the plaintext ${p}_{j}$. Decryption is a simple matter, as Bob must merely recreate the bytes ${x}_{j}$ and XOR with the ciphertext ${c}_{j}$ to get out the plaintext ${p}_{j}$.

There is a simple problem with this procedure for encryption and decryption. What if Alice wants to encrypt a message on Monday, and a different message on Wednesday? How should she create the pseudorandom bytes? If she starts all over, then the pseudorandom sequence ${x}_{j}$ on Monday and Wednesday will be the same. This is not desirable.

Instead, we must introduce some randomness to make certain the two bit streams are different. Thus, each time Alice sends a message, she should choose a random initialization vector, which we denote by ${x}_{0}$. She then starts by creating ${x}_{1}={L}_{8}\left(h({K}_{AB}\parallel {x}_{0})\right)$ and proceeding as before. But now she must send ${x}_{0}$ to Bob, which she can do when she sends ${c}_{1}$. If Eve intercepts ${x}_{0}$, she is still not able to compute ${x}_{1}$ since she doesn’t know ${K}_{AB}$. In fact, if $h$ is a good hash function, then ${x}_{0}$ should give no information about ${x}_{1}$.

The idea of using a hash function to create an encryption procedure can be modified to create an encryption procedure that incorporates the plaintext, much in the same way as the CFB mode does.