How many bytes for password reset token? Should one take steps to hash or conceal raw CSPRNG bytes?

Question

I'm trying to follow the OWASP 'Forgot Password Cheat Sheet' recommendations for password reset functionality via email. This requires my server to generate a token. OWASP says that PHP's random_bytes() and openssl_random_pseudo_bytes() functions are adequate for such a token. My questions:

Is it safe to simply append a hex or base64 representation of these bytes to a url and email it to a user? Or does exposing the raw, unmodified bytes expose my system's CSPRNG behavior to unwanted scrutiny by bad guys?
If it is unsafe, would a SHA1 hash of the raw random bytes suffice to conceal my server's CSPRNG behavior while still serving its purpose as a password reset token?
How many random bytes should such a token have if I want it to be valid for an hour? For 24 hours?

score 1 · Answer 1 · answered Mar 24 '21 at 22:36

You asked:

Is it safe to simply append a hex or base64 representation of these bytes to a url and email it to a user? Or does exposing the raw, unmodified bytes expose my system's CSPRNG behavior to unwanted scrutiny by bad guys?

This question has been asked and answered many times and I won't attempt to give a full answer, but basically, yes, any API that advertises itself as a CSRPNG is safe to send directly to users and will not leak the internal state of your system RNG.

Related question:

Is a rand from /dev/urandom secure for a login key?

You asked:

How many random bytes should such a token have if I want it to be valid for an hour? For 24 hours?

First off, go watch this 5 minute youtube video:

How secure is 256 bit security? - YouTube

Based on that you should be good forever with 128 bits of randomness. You can get away with less but for exactly how much less you'll have to do your own math:

Do you have a rate-limit on your API? If so, how many guesses will it allow in 24 hours?
If an attacker guesses at the max rate, would you be comfortable if they had a 1/2 (ie 2^-1) chance of guessing right? 1/1,000,000 (~ 2^-20)? For "cryptographic strength" you'd want like 2^-80 - 2^-128.

Multiplying those together will give you the number of bits of randomness you need in the token.

Thank you for your answer. To clarify regarding first question, it is not whether the random bytes are 'secure' or 'random' enough, but rather *is it safe to expose the raw random number generator's results to the internet*. This is a subtly different question. I'm wondering whether I should try to obfuscate the raw random bytes returned directly from `random_bytes` lest a bad guy is able to perform some [statistical analysis to recognize weakness](https://en.wikipedia.org/wiki/Random_number_generator_attack#Debian_OpenSSL) or whether a hash function as prophylactic might provide some cloak. — S. Imp, Mar 24 '21 at 22:46
@S.Imp I believe that is what I answered ... The CSPRNG will internally be made from hash functions (apparently [the linux /dev/urandom now uses ChaCha20](https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/LinuxRNG/LinuxRNG_EN.pdf?__blob=publicationFile&v=20), huh, neat) so applying another hash function seems unessessary. — Mike Ounsworth, Mar 25 '21 at 01:44

score 1 · Answer 2 · answered Mar 24 '21 at 22:40

It's secure to use a token generated by your system's CSPRNG without further processing (other than encoding, if you like). All CSPRNGs meet the next-bit test, which means that given a stream of output, it is impossible to predict the future output (the next bit) more easily than chance. It is therefore always secure to expose CSPRNG output to an attacker without worrying about compromising other output from that CSPRNG. (Note that the attacker should not be able to see the state of your CSPRNG, but assuming your server is secure, that's the case.)

You can safely use a standard encoding, such as base64, base64url, base32, or hex; whatever you like.

Since you're generating random tokens, I'd recommend 32 bytes (256 bits). That means the probability of randomly generating two tokens that are the same accidentally is about 1 in 2^128, which is the acceptable level of security you're going for. That should be acceptable for any length of time within the next couple decades or so.

score 0 · Answer 3 · answered Mar 24 '21 at 23:14

Is it safe to simply append a hex or base64 representation of these bytes to a url and email it to a user? Or does exposing the raw, unmodified bytes expose my system's CSPRNG behavior to unwanted scrutiny by bad guys?

According to the Kerckhoffs's principle, you should suppose that an attacker knows what algorithm was used for encoding. That's why it doesn't matter if you use unmodified bytes or encode them using hex or base-64 format. Use encoding that you feel more comfortable with.

By the way, you cannot use any bytes. For instance, you cannot use byte with value 32, because it represents a space and needs to be encoded by "%20". There are also other bytes that are not allowed in the URL. That's why I'd suggest you to use some encoding that converts an array of bytes into a valid. The standard base-64 mapping requires, that the sequence of bits 111111 should be represented by "/". Having that in the URL will lead to other meaning of URL. To prevent it, you will have to escape it to "%2F". This may be lost somewhere. Thus it may be better to use hex representation of the generated random bytes.

If it is unsafe, would a SHA1 hash of the raw random bytes suffice to conceal my server's CSPRNG behavior while still serving its purpose as a password reset token?

A SHA1 hash consists of 20 bytes. If your token is longer than 20 bytes, then applying SHA1 you effectively reduces the entropy.

How many random bytes should such a token have if I want it to be valid for an hour? For 24 hours?

Limit the number of password reset requests from a single IP per hour or per second. For instance, allow not more than 1 000 password reset requests per hour per IP.

Then it depends on what threats you consider. If you expect that an attacker can use a single IP, then max. 1 000 tokens can be tested within an hour. This is ~2^10. Suppose you want to have probability to guess a token 1 to 1 000 000, which is approx. 1 to 2^20. Thus the token should consist of 30 bits, which means 4 bytes. If token is valid 24 hours, then 24 times more tokens can be tested, in our case 24 000, which is ~2^15. Thus for the same probability you would need a token of 35 bits, which is 5 bytes.

If you expect that your attacker can be some bot network that consists of 10 000 000 computers, which is ~2^23, then 1-hour tokens should consist of 10 + 20 + 23 = 53 bits = 7 bytes.

Depending on what probability and what number of requests per hour per IP you consider as acceptable, you will get other numbers.

What else to consider?

In case you send tokens as a text that users need to type in manually, it makes sense to think about user experience and try to keep tokens short. But if you send a link that includes a token, then don't hesitate to make tokens longer.

How many bytes for password reset token? Should one take steps to hash or conceal raw CSPRNG bytes?

3 Answers3