TOPICS
Search

Hash Function


A hash function H projects a value from a set with many (or even an infinite number of) members to a value from a set with a fixed number of (fewer) members. Hash functions are not reversible. A hash function H might, for instance, be defined as y=H(x)=|_10x (mod 1)_|, where x in R, y in [0,9], and |_x_| is the floor function.

Hash functions can be used to determine if two objects are equal (possibly with a fixed average number of mistakes). Other common uses of hash functions are checksums over a large amount of data (e.g., the cyclic redundancy check [CRC]) and finding an entry in a database by a key value. The UNIX c-shell (csh) uses a hash table to store the location of executable programs. As a result adding new executables in a user's search path requires regeneration of the hash table using the rehash command before these programs can be executed without specifying the complete path.

To illustrate the use of hash functions in database lookups, consider a database consisting of an array containing an index n, a name, and a telephone number, with names listed in arbitrary order.

nnamenumber
0Parker12345
1(empty)
2Davis43534
3Harris32452
4Corea46532
5Hancock96562
6Brecker37811
7(empty)
N-1Marsalis54323

To look up Hancock from this array, you would start at the beginning of the array, compare the names, then try the next until the names match. This very simple algorithm finds any entry in 1 to N steps, giving an average seek time of N/2. The seek time is therefore proportional to N. A much faster result can generally be achieved, if the database is sorted.

nname
0Brecker
1Corea
2Davis
3Hancock
4Harris
5Marsalis
6Parker
7(empty)
N-1(empty)

An efficient algorithm on this sorted array first checks entry N/2, and then recursively uses bisection to check entries in intervals [0,N/2-1] or [N/2+1,N-1], depending wether the most recently looked-up name precedes or succeeds the name sought. The average seek time of this procedure this is proportional to lnN.

The idea behind using a hash function here is that although the possible number of combinations of characters in a name is quite large, only a subset of them is usually found in practice (i.e., names such as "Kwqrst" are much less common than names like "Jones.") Therefore, when you insert an entry into the database at an index that can somehow be calculated using a key (which is also available at the time you search for it), you might be able to find it later at the first location you check.

Consider the following simple example in which the hash function H is simply the sum of ASCII codes of characters in a name (considered to be all in lowercase) computed mod N=13.

nameH
Brecker6
Corea2
Davis2
Hancock12
Harris12
Marsalis2
Parker8

The above example illustrates that the hash function can give the same results for different keys. This difficulty is typically circumvented by introducing a second hash function H_2 whose results are designed to be completely different from that of H. For illustrative purposes, let H_2 be one plus the bitwise exclusive or of all codes in a name (again taken as all lowercase) mod N-1. This gives the following table.

nameH_2
Brecker11
Corea3
Davis10
Hancock4
Harris8
Marsalis3
Parker8

A new index can then be calculated as the sum of the first index and H_2 (mod N) until an empty slot is found where new data can be stored. Note that when using H_2 as an offset to walk through the database, it is not, in general, guaranteed that any key will eventually reach any slot. However, for certain values of N, namely N a prime number, such behavior is guaranteed, so N is always chosen to be prime. After computing H_2 with N=13 (a prime), the above phone list would look like this for names added in alphabetic order.

indexkeycompares to find
0(empty)
1(empty)
2Corea1
3Hancock2
4(empty)
5Marsalis2
6Brecker1
7Harris2
8Parker1
9(empty)
10(empty)
11(empty)
12Davis2

The average seek time for locating a name in this table depends on the kind of data, N, and the quality of the hash functions used. However, for reasonable choices of hash functions, it will be much smaller than lnN.


See also

Collision-Free Hash Function, Cryptographic Hash Function, Cyclic Redundancy Check, One-Way Hash Function, Hash Table, Universal Hash Function

Explore with Wolfram|Alpha

References

Partow, A. "General Purpose Hash Function Algorithms." http://www.partow.net/programming/hashfunctions/.

Referenced on Wolfram|Alpha

Hash Function

Cite this as:

Weisstein, Eric W. "Hash Function." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/HashFunction.html

Subject classifications