3

I'm trying to understand how typical Linux distributions generate the password field for entries in /etc/shadow. I can't figure out what encryption algorithm is being used to produce the encrypted password string.

For example:

$1$CQoPk7Zh$370xDLmeGD9m4aF/ciIlC.

From what I understand, the first value 1 signifies that an MD5 hashing method was used with the second value CQoPk7Zh as the salt. However, what was the encryption algorithm used to produce the final encrypted password string 370xDLmeGD9m4aF/ciIlC.?

The Linux man page for crypt(3) has a note for glibc2:

If salt is a character string starting with the characters "$id$" followed by a string terminated by "$":

$id$salt$encrypted

then instead of using the DES machine, id identifies the encryption method used and this then determines how the rest of the password string is interpreted.

I think the semantics are throwing me off: the manual says id specifies the encryption method, but it's specifying the hashing method.

I believe the process is working like this:

password ==> MD5(password+salt) ==> hash ==> encryption-algorithm(hash) ==> encrypted-pass

So how do I determine the encryption algorithm? Specifically, what encryption algorithm is used when the id is 6 (SHA512)?

Vilhelm Gray
  • 390
  • 2
  • 9

2 Answers2

2

I think they really mean hashing method for the hash in use. The format is:

"$id$salt$hashed", where "$id" is the algorithm used (On GNU/Linux, "$1$" stands for MD5, "$2a$" is Blowfish, "$5$" is SHA-256 and "$6$" is SHA-512 ...)

the source of which is available on wikipedia.

I've just looked at the manual and it does indeed say "encryption method". These are definitely not encryption methods like DES, but rather "one way" hash functions.

  • Why is the resulting string `370xDLmeGD9m4aF/ciIlC.`? I would expect an **MD5** hash to consist of a series of hexadecimal digits, but this string does not fit that format. – Vilhelm Gray May 16 '13 at 14:47
  • 1
    @VilhelmGray I imagine they encoded the hash value in a range of printable characters. It looks like BASE64 but I could be wrong. Either way, encoding in just 0-9,A-F requires two bytes per byte of actual data and is therefore quite expensive to use. –  May 16 '13 at 14:49
  • Oh I see, so the process the process would probably be similar to: `pass ==> MD5(pass+salt) ==> encoding-func(hash) ==> encoded-hash` – Vilhelm Gray May 16 '13 at 14:54
  • 2
    @VilhelmGray yep. MD5(pass+salt) will just be an array of bytes, e.g. `uint8_t arr[20];` (adjust 20 for bits/8) and the encoded version will be too, just modified so that each byte is printable. Technically, you could just store it in the file unprintable (what you might call a binary file) but Unix configuration has always traditionally used readable text files. –  May 16 '13 at 14:56
1

There is no encryption algorithm involved. The use of “encryption” is a misnomer, due to the historical password hashing algorithm being based on DES, which is primarily used for encryption and known as such. id is in fact the hashing method, and all documentation should properly use the word “hash” throughout instead of “encrypt”.

For id = 1, the algorithm isn't MD5(password+salt), but a variation on MD51000(salt+password). The fixed iteration count (large at the time it was introduced, but now tiny) is the main reason to deprecate this scheme (also, the custom variation).

With id = 6, SHA-512 is used, again with many iterations. Ulrich Drepper, GNU libc maintainer, has written a description of SHA-2-crypt with a precise description and sample code.

I think all common crypt variants encode the hash in a base64 variant (not “the” base 64, but a slightly different set of 64 printable characters).

Gilles 'SO- stop being evil'
  • 50,912
  • 13
  • 120
  • 179
  • `crypt` encodes using a [variant of Base64](http://serverfault.com/a/499293). This can be verified in code: `static const char b64t[64] = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";` – Vilhelm Gray May 16 '13 at 14:57