12

We are currently using HMACSHA512 in .net, with a 128Char (64byte) validation key The salt is 64 char randomly generated string. We allocated 2048 length on the database for the hashed base64 string result. It will be a public facing website. Is this approach reasonable or should it be changed to another approach such as Rfc2898DeriveBytes?

 public string HashEncode(string password, string salt, string validationKey) {
        byte[] hashKey = BosUtilites.HexStringToByteArray(validationKey);
        var sha512 = new HMACSHA512(hashKey);
        var hashInput = BosUtilites.StringToByteArray(password + salt);
        byte[] hash = sha512.ComputeHash(hashInput);
        return Convert.ToBase64String(hash);
    }

 public string GenerateSimpleSalt(int Size = 64) {
        var alphaSet = new char[64]; // use 62 for strict alpha... that random generator for alphas only
        //nicer results with set length * int i = 256. But still produces excellent random results.
        //alphaset plus 2.  Reduce to 62 if alpha requried
        alphaSet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890#=".ToCharArray();
        var tempSB = GenerateRandomString(Size, alphaSet);
        return tempSB.ToString();
    }

    public StringBuilder GenerateRandomString(int Size, char[] alphaSet) {
        using (var crypto = new RNGCryptoServiceProvider()) {
            var bytes = new byte[Size];
            crypto.GetBytes(bytes); //get a bucket of very random bytes
            var tempSB = new StringBuilder(Size);
            foreach (var b in bytes) { // use b , a random from 0-255 as the index to our source array. Just mod on length set
                tempSB.Append(alphaSet[b%(alphaSet.Length)]);
            }

            return tempSB;
        }

EDIT2: In case Someone finds this via google, I have included the lessons learnt Average sample in tests on workstations was around 300 msecs. This should not be too noticeable during logon. And no more need for a Validation key. Which is a relief :-)

 SCrypt package installed via nuget. and rfc2898 PBKDF2 changed to be large number or iterations but only 20bytes output.  SAme CPU time.

New passwords are encoded in SCRYPT by default,

  <package id="CryptSharpOfficial" version="2.0.0.0" targetFramework="net451" />
  // save salt, hash algorithm used and resulting encoding on user record
  public string PasswordEncode(string password, byte[] salt, HashAlgorithm hashAlgorithm ) {
        switch (hashAlgorithm) {
            case HashAlgorithm.PBKDF2:
                    var deriver2898 = new Rfc2898DeriveBytes(password, salt,<Use a number around 50K>); // approx 300msecs on workstation
                    byte[] hash = deriver2898.GetBytes(20); // 
                    return Convert.ToBase64String(hash);
            case HashAlgorithm.Scrypt:
                var key = Encoding.UTF8.GetBytes(password);
                byte[] hashScrypt =  SCrypt.ComputeDerivedKey(key: key, salt: salt, 
                                    cost: 65536, // must be a power of 2 !, on PC, singlethread this is approx 1 sec
                                    blockSize: 8, 
                                    parallel: 1,
                                    maxThreads: 1, 
                                    derivedKeyLength: 128);

                    return Convert.ToBase64String(hashScrypt);
            default:
                throw new ArgumentOutOfRangeException("hashAlgorithm");
        }
    }
soadyp
  • 895
  • 2
  • 7
  • 11
  • Just for the sake of clarity, is this approach reasonable for what? Password hashing yes, but for what kind of user, for what kind of site, that needs what level of assurance? – Steve May 02 '13 at 17:14
  • 3
    And as a general rule, HMACSHA512 is not good enough for password hashes. – Steve May 02 '13 at 17:15
  • By reasonable I meant that if hacked, it should not be "EASY" to brute force the passwords. So SLOW is now such an OBVIOUS outcome. The other factor of acceptable is that by law, a reasonable effort to protect customer info and privacy must be made. I think we would have passed reasonable. But there was way more room for improvement than I anticipated. It explains why i was seeing RFC2898 mentioned on stack exchange and it why i decided to ask the question. Glad i Did. THANKS – soadyp May 03 '13 at 01:07

4 Answers4

15

Rfc2898DeriveBytes implements PBKDF2: a function which turns a password (with a salt) into an arbitrary-length sequence of bytes. PBKDF2 is often used for password hashing (i.e. to compute and store a value which is sufficient to verify a password) because it has the needed characteristics for password hashing functions: a salt and configurable slowness.

These characteristics are needed because passwords are weak: they fit in human brain. As such, they are vulnerable to exhaustive search: it is feasible, on a general basis, to enumerate most passwords that human users will come up with and remember. The attack assumes that the attacker got a copy of the salt and the hashed password, and then will "try passwords" on his own machine. That's called an offline dictionary attack.

In your case, you have a third element: a validation key. It is a key, i.e. supposedly secret. If the attacker could grab the salts and hashed passwords but not the validation key, then he cannot perform the dictionary attack on his own machines; under these conditions (the validation key remains secret, and the validation algorithm is robust -- HMAC/SHA-512 is fine for that), the configurable slowness of PBKDF2 is not needed. This kind of validation with a secret key is sometimes called "peppering".

Note, though, that when we assume that the attacker could grab a copy of the hashed passwords, then it becomes a matter of delicacy to suppose that the key remained unsullied by his vile glances. This depends on the context. Most SQL injection attacks will be able to read part of all of the database, but not the rest of the files on the machine. Nevertheless, your server must somehow be able to boot up and start without human intervention, so the validation key is somewhere on the disk. An attacker stealing the whole disk (or a backup tape...) will get the validation key as well -- at which point you are back to the need for configurable slowness.

Generally speaking, I would recommend PBKDF2 (aka Rfc2898DeriveBytes in .NET) over a custom construction, although I must say that you appear to use HMAC properly (homemade constructions rarely achieve that level of correctness). If you insist of having a "validation key" (and you are ready to assume the procedural overhead of key management, e.g. special backups for that key), then I suggest using PBKDF2 and then applying HMAC on the PBKDF2 output.

See this answer for a detailed discussion on password hashing.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • 1
    Im going to mark this as the answer. The explanation was very useful to a layman in cryptography. Im left thinking we have a very solid but not yet optimum solution. The SALT will be improved. See point from Rook . I will investigate how the algorithm can be switched to PBKDF2 . The point on Validation Key and its Weakness is clear. I have seen the results of a LOST key. I will read the link and see if my next question is answered there. Thank you for thoughtful reply. – soadyp May 02 '13 at 23:24
  • Having read that excellent article link. I now understand the importance of slowness. I spend my life trying to make code faster. So this was an ironic learning point. Im adding a field to the database now to track Hash method. So I can transition users over to 1 of the suggested approaches. AS users logon, they will be converted to the safer approach. Im now looking into scrypt examples to see if we can use this. Thanks all for very useful posts – soadyp May 03 '13 at 01:01
10

I'm responding in specific to the EDIT of lessons learned in the original question.

Rfc2898DeriveBytes is called with 1000 iterations. using 1024 byte output. The password size in Db was designed 2k fortunately. Average sample in tests on workstations was around 300 msecs

Quick summary: If you like your current CPU load and Rfc2898DeriveBytes, then change from 1000 iterations and a 1024 byte output to 52000 iterations and a 20 byte output (20 bytes is the native output of SHA-1, which is what .NET 4.5 Rfc2898DeriveBytes is based on).

The explanation of why, including references:

To avoid giving attackers an advantage over you, do NOT use a PBKDF2/RFC2898/PKCS#5 (or an HMAC, which is used internally in PBKDF2 et al.) with an output size larger than the native output of the hash function used. Since .NET's implementation (as of even 4.5) hardcodes SHA-1, you should use a maximum of 160 bits of output, not 8192 bits of output!

The reason for this is that as we refer to the RFC2898 spec, if the output size (dkLen, i.e. Derived Key Length) is greater than the native hash output size (hLen, i.e. Hash Length). On page 9, we see

Step 2: "Let l be the number of hLen-octet blocks in the derived key, rounding up"

Step 3: " T_1 = F (P, S, c, 1) , T_2 = F (P, S, c, 2) , ... T_l = F (P, S, c, l) , "

And on page 10: Step 4: "DK = T_1 || T_2 || ... || T_l<0..r-1>" where DK is the derived key (PBKDF2 output) and || is the concatenation operator.

Therefore, we can see that for your 8192 bit key and .NET's HMAC-SHA-1, we have 8192/160 = 51.2 blocks, and CEIL(51.2) = 52 blocks requied for PBKDF2, i.e. T_1 through T_52 (l = 52). Spec references to what happens with the .2 differently from a full block is outside the scope of this discussion (hint: truncation after the full result is calculated).

Thus, you're running a set of 1000 iterations on your password a total of 52 times, and concatenating the output. Thus, for one password, you're actually running 52000 iterations!

A smart attacker is going to run only 1000 iterations, and compare their 160 bit result to the first 160 bits of your 8192 bits of output - if it fails, it's a wrong guess, move on. If it succeeds, it's almost certainly a successful guess (attack).

Thus, you're running 52,000 iterations on a CPU, and an attacker is running 1,000 iterations on whatever they have (probably a few GPU's, which are each massively outpacing your CPU for SHA-1 in the first place); you've given attackers a 52:1 advantage over and above hardware advantages.

Thankfully, a good PBKDF2 function is easy to adjust; simply change your output length to 160 bits and your number of iterations to 52,000, and you'll use the same amount of CPU time, store smaller keys, and make it 52 times more expensive for any attackers at no runtime cost to yourself!

If you want to further nerf attackers with GPU's, you may wish to change over to PBKDF2-HMAC-SHA-512 (or scrypt or bcrypt) and an output size of 64 bytes or less (the native output size of SHA-512), which significantly reduces the amount of advantage current (early 2014) GPU's have over CPU's due to 64-bit instructions being in CPU's but not GPU's. This is not natively available in .NET 4.5, however.

  • However, @Jither created a nice example of adding PBKDF2-HMAC-SHA256, PBKDF2-HMAC-SHA384, PBKDF2-HMAC-SHA512, etc. capabilities to .NET, and I've included a variant with a reasonable set of test vectors in my Github repository for reference.

For another reference related to an actual design flaw in 1Password, see this Hashcat forum thread - "For each iteration of PBKDF2-HMAC-SHA1 you call 4 times the SHA1 transform. But this is only to produce a 160 bit key. To produce the required 320 bit key, you call it 8 times."

P.S. if you so choose, for a small design change you can save a little more database space if you store the output in a VARBINARY or BINARY column instead of doing the Base64 encode.

P.P.S. i.e. change the test code in your edit as follows (2000*52 = 104000); note that your text said 1000 and your test listed 2000, so text to text and code to code, so mote it be.

//var hash = libCrypto.HashEncode2(password, salt, 2000);
var hash = libCrypto.HashEncode2(password, salt, 104000);
// skip down into HashEncode2
//byte[] hash = deriver.GetBytes(1024);
byte[] hash = deriver.GetBytes(20);
Anti-weakpasswords
  • 9,785
  • 2
  • 23
  • 51
4

The SHA2 family is not a good choice for password storage. It is significantly better than md5, but really you should be using bcrypt (or scrypt!).

RNGCryptoServiceProvider is a good source of entropy. Ideally a salt is not base 64, but base 256, as in an entire byte. To understand this better, you need to know how rainbow tables are generated. The input to rainbow table generation requires a keyspace. For example, an rainbow could be generated to lookup: alpha-numeric-symbol from 7-12 characters long. Now to crack this proposed salt scheme the attacker would have to generate a alpha-numeric-symbol with 71-76 characters long to compensate for the 64 character salt (which is large). Making the salt a full byte, would greatly increase the keyspace the rainbow table would have to exhaust.

rook
  • 46,916
  • 10
  • 92
  • 181
  • Thanks Rook. We are looking at sCrypt now: nuget "install-package CryptSharp" With RFC2898DerivedBytes using 1000 iterations we are pretty solid if I have understood the various links provided. However, you now have me very interested in testing this further... http://www.zer7.com/software.php?page=cryptsharp which has an ISC license :-) – soadyp May 03 '13 at 03:07
0

There is, I think, a bug in the question's RandomString function (besides a couple of minor syntax issues). It has a bias whenever the Length parameter is not a factor of 256. If Length was 62, as the code suggests as an option, 'a' and 'b' would occur more frequently than any other character (instead of a nice even 1/62 distribution for all characters, a and b would occur twice as often as other characters, 2/64. The remaining 60 characters would occur 1/64).

Because there's a bias, there's an increased chance of generating identical salts. It's not very likely, but it's there. As it stands, though, if you use all 64 characters it's fine.

Pierre.Vriens
  • 165
  • 1
  • 1
  • 11
stevieg
  • 301
  • 1
  • 3
  • default is 64. The 62 is used for generating short strings to via SMS where strict alpha is required. a slight bias is no concern in the use case. – soadyp Jul 27 '15 at 08:47