What is Hash Function?

  A hash function H is a transformation that takes a variable-size input m and returns a fixed-size string, which is called the hash value h (that is, h = H(m)). Hash functions with just this property have a variety of general computational uses, but when employed in cryptography the hash functions are usually chosen to have some additional properties.

  The basic requirements for a cryptographic hash function are:

  • the input can be of any length,
  • the output has a fixed length,
  • H(x) is relatively easy to compute for any given x,
  • H(x) is one-way,
  • H(x) is collision-free.

  A hash function H is said to be one-way if it is hard to invert, where “hard to invert” means that given a hash value h, it is computationally infeasible to find some input x such that H(x) = h.

  If, given a message x, it is computationally infeasible to find a message y i x such that H(x) = H(y) then H is said to be a weakly collision-free hash function.

  A strongly collision-free hash function H is one for which it is computationally infeasible to find any two messages x and y such that H(x) = H(y).

  The hash value represents concisely the longer message or document from which it was computed; one can think of a message digest as a “digital fingerprint” of the larger document. Examples of well-known hash functions are MD2 and MD5 and SHA .

  Perhaps the main role of a cryptographic hash function is in the provision of digital signatures. Since hash functions are generally faster than digital signature algorithms, it is typical to compute the digital signature to some document by computing the signature on the document’s hash value, which is small compared to the document itself. Additionally, a digest can be made public without revealing the contents of the document from which it is derived. This is important in digital time stamping where, using hash functions, one can get a document time stamped without revealing its contents to the time stamping service.

MD5 (Message Digest Algorithm 5)
Is a secure hash algorithm developed at RSA Data Security, Inc. It can be used to hash an arbitrary length byte string into a 128 bit value. MD5 is in wide use, and is considered reasonable secure.

However, some people have reported potential weaknesses in it, and "keyed MD5" (typically used for authentication by having a shared secret, and computing an authentication value by hashing first the secret (as a key), and then the data to be hashed) has been reported to be broken. It is also reported that one could build a special-purpose machine costing a few million dollars to find a plaintext matching given hash value in a few weeks.

MD5 is available from ftp.funet.fi:/pub/crypt/hash/mds/md5 . It is also included in PGP source code, SSLeay, RSAREF, Crypto++, and Ssh source code. MD5 is described e.g. in Bruce Schneier: Applied Cryptography, John Wiley & Sons, 1994.

MD2, MD4
These are older hash algorithms from RSA Data Security. They have known flaws, and their use is not recommended. Source code for MD2 is included at least in SSLeay and RSAREF on the software page.

SHA (Secure Hash Algorithm) (also SHS, Secure Hash Standard)
This is a cryptographic hash algorithm published by the United States Government. It produces an 160 bit hash value from an arbitrary length string. Many people consider it quite good. It is a fairly new algorithm.

SHA is available in many cryptographic libraries, such as Crypto++.

Tiger
is a new hash algorithm developed by Anderson and Biham. It is available from ftp.funet.fi:/pub/crypt/hash/tiger.