February 22, 2021

What is a hash function?

Generally speaking, a hash function is a way of transforming data of arbitrary size into a fixed-size representation. A hash function should always produce the same result for the same given data. However, even slight changes in the data should drastically change the resulting hash value.

The result of the hash function always has a fixed size, it does not matter what is the size of the initial data. A good analogy would be a glass of water. If you pour a little water into a glass it would still be full because part of it would be water that you poured in and another part would be filled with air. On the other hand, if you pour too much water, the glass would be full of it, and the rest would be spilled all over the table. Hash function works similarly.

Hashing collisions

There are cases when data size is less than the hash function size, however, in the majority of cases the size of the data is much bigger than the hash function size. Therefore, there can be a case when different data produces exactly the same hash value. This is called a collision. The smaller the size of the hash function, the more probable it would be to get a collision. Thus, in a lot of cases hash functions of bigger sizes are recommended.

Hash functions applications

Hash functions have various applications. However, in most cases, they either ensure only data integrity (that the data has not been tampered with) or additionally data authenticity (that data belongs to a certain person or organization which can be verified).

They can also be split into cryptographic and non-cryptographic hash functions. It needs to be said that hash functions are considered to be ‘one-way’ functions, which means that it is quite easy to calculate the hash of data, however, it should be very difficult to get initial data from its hash. Cryptographic hash functions try to ensure this rule because it is essential for its operation. Non-cryptographic hash functions usually have completely different purposes and are not applied to any type of sensitive data, so they don’t necessarily ensure the rule and are not considered to be secure.

Checksums

A checksum is one of the simplest examples of a hash function. It only ensures data integrity and is not considered to be a cryptographic hash function as it does not support enough security. Usually, it is used to send data over the network and verify on the receiving side whether data was corrupted during transmission.

CRC

A good example of a checksum is a Cyclic Redundancy Check (CRC in short). In simple words, this function requires input data and a divisor. The remainder of a polynomial division is added to the message and transferred over the network. The receiver calculates the remainder in the same way as the sender and, if the remainders match, the data is considered to be solid.

Non-cryptographic hash functions for data lookup

As was mentioned previously, non-cryptographic hash functions are considered to be not secure, so they are not usually used for sensitive data processing and transmission. However, they have various other applications where data sensitivity is not important. One of the examples is a data lookup.

Large size data can be stored in a database or a file. When an application needs to verify whether some data already exists, comparing data bit by bit can be somewhat tricky. Thus, in such cases, a hash-based key is introduced, and the data is stored under this key. Since it is quite easy to calculate data hash and compare it with another hash value, this approach introduced a lot of performance optimizations.

MurmurHash

One of the examples of a data lookup hash function is a MurmurHash. It got its name from two basic operations that are used in its algorithm – multiply (MU) and rotate (R). The algorithm itself has an inner loop and an outer loop of similar operations, thus the ‘MurMur’ part of the name.

Cryptographic hash functions

Cryptographic hash functions are initially designed to be secure and process sensitive data. However, over the years people discover vulnerabilities in a lot of such functions. If there are multiple severe vulnerabilities, a function can be ‘downgraded’ to a non-cryptographic one and used for other purposes later on. One of such functions is an MD5 hash function. It initially was considered to be quite secure, however, after significant vulnerabilities were discovered, it was reduced to usage as a checksum.

SHA-2

A group of cryptographic hash functions which are widely adopted nowadays (although several vulnerabilities were discovered) is called Secure Hash Algorithm 2 (or SHA-2 in short). It contains similarly designed functions that mostly differ in output size – from 224 to 512 bits. SHA-2, as most of the other cryptographic hash functions, is quite complex. In short, it has several loops of combinations of boolean operations.

Conclusion

This article was a short overview of hash functions, how they work and what applications they have. I tried to explain it as easily as possible considering the complexity behind cryptography in general. I hope you enjoyed it and see you in the next one.