3

A fellow developer and I have been having a discussion about how vulnerable a few different methods of developing a hash are, and I've come here to see if smarter people than I (us?) can shed some light.

In PHP, I feel the below is secure ENOUGH to generate as 32 character value that could not be reasonably broken via online attack. There are some other mitigating circumstances (such as in our specific case it would also require the attacker to already have some compromised credentials), but I'd like to just look at the "attackability" of the hash.

str_shuffle(MD5(microtime()))

The suggested more secure way of generating a 32 character hash is:

bin2hex(random_bytes(16))

I acknowledge the first hash generation method is not ABSOLUTELY SECURE, but for an online attack I think being able to guess the microtime (or try a low number of guesses), and know the MD5 was shuffled and/or find a vulnerability in MT which str_shuffle is based on is so low as to make it practically secure.

But I would love to hear why I'm a fool. Seriously.

EDIT --- This is being used as a password reset token, and does not have an expiry (although it is cleared once used, and is only set when requested).

  • @schroeder It's an account recovery key if a user loses their 2FA token generator, not generated off of any user input. Basically the user could be emailed this key to use it as a one-time login... however it is only generated if a user requests it, and in order to request it an attacker would have to have already broken their username/password. Online attack means over the internet (not considering the scenario of having the DB stolen and cracked offline) – Brian Brownton Jun 22 '20 at 15:59

2 Answers2

4

Microtime() is an incremental value, not a source of random values (quite the opposite). Using this as a source of randomness causes the output to be... More predictable. It drastically reduces the key space of your hash method to a range that can be guessed reasonably well.

Your colleague generates their hash using md5(random()), you md5(microtime()). To predict one of your valid tokens the attacker needs to work out an approximate time and generate hashes using microtime() of around that time frame, more or less seconds or minutes. To attack your colleague's hash you need to test the entire key space of random(), that is significantly more expensive computationally.

Argument: "the attacker does not know how the hash is generated". Maybe, but now you are relying on obscurity to keep your method secure.

Argument: "nobody would ever bother to do this". Well, given enough intention they would.

Argument: "did you spot the str_shuffle()?". Now I have. It does somewhat increase complexity and thus effort to break, but the first thing that's written on the function page is 'this function is not cryptographically secure'.

Argument: "I have other controls around how the token is used". That is a good one, you could have relatively simple hashes for this and a short expiry time for the token as well as ensuring it is used only once.

My own argument: "Wy bother reinventing the wheel and not just use a decent randomness generator to produce good random strings for hashing?"

Pedro
  • 3,911
  • 11
  • 25
1

First estimate what would it cost you or your users if smb. has really has got access to the data of some of your users. Will this user lose 1 000 000 USD or will it just lose a few emails inviting to the local bowling club? Then estimate of somebody will really want to invest time and other resources to break your system. If there is no sense for an attacker, then your approach may be quite sufficient.

But if you really want to prevent attacks, let's consider following. Here is just one possible attack scenario.

An attacker registers as a user on your web site or finds somebody who is already registered and wants to help, this doesn't matter.

The attacker determines the time shift on your server with high precision, like several seconds.

The attacker chooses some attack time. For a time period around this time the attacker calculates the possible reset tokens for every microsecond. In the period of plus-minus 1 minute there are 120 000 000 microseconds. For every value the attacker calculates the password reset tokens. The MD5 is very fast, so it doesn't take much time. Also for storing of 120 000 000 hashes not much space is needed.

Then at the chosen time the attacker sends consequently multiple password reset requests to your server. For simplicity, let say 5 requests with the attacker user name, then one requests with the target user name, then again 5 requests with the attacker user name. Then the attacker receives 10 reset emails from your server. Based on these tokens and on the per-calculated hash list, the attacker knows exactly the time stamps used by your server for every reset token.

Of course the requests can be processed asynchronously. A request sent later can be processed a few milliseconds earlier. But with a big number of requests this gives a high precision.

It can be even worse, if your server has some patterns in timestamps. For instance, your server might have resource bottlenecks in token generation, and the real time stamps might be generated not with a step of 0.000001 s, but might be a multiplier of let say 0.020 s. This might greatly reduce the number of possible timestamps to test.

Now the attacker program tries to reset password for every token, starting from the most probable ones. The total number of attempts depends on your server. In the worst case the attacker may need to test 1 000 - 10 000 tokens. In the good case even a few requests may be sufficient.

It depends also on how you implement password reset. If you lock the user after 3-5 failed password reset attempts, this may prevent successful attack. But there are still 2 potential problems:

  1. The precision of determining of the target user timestamp may be very high. Then even 1-2 attempts might be sufficient.
  2. The attacker can have the database of all user names and can request password reset for all of them or for many of them at the same time. Then even if you lock users after 3-5 failed attempts, the attacker has better chance to find a user or several users for whom the timestamp will be determined precisely and thus the reset will succeed.

TLDR

The security of hashes depends essentially on entropy, or simply put, on randomness. If the values are predictable, this makes such hashes vulnerable to attacks. The function random_bytes uses a cryptographic random number generator, which uses much more entropy and makes generated values practically unpredictable.

mentallurg
  • 8,536
  • 4
  • 26
  • 41
  • One of the two key points here is glossed over: "For every value the attacker calculates the password reset tokens" - just because it is a 32 character hash doesn't mean the attacker knows how to calculate it. Requesting enough reset keys to plausibly figure this out would alert us. – Brian Brownton Jun 22 '20 at 20:54
  • 1
    @BrianBrownton: Based on Kirkhoff's principle we should assume that the attacker knows the hash generation algorithm exactly. What is unknown, is `microtime()`. If this function is used in the mode with microsecond precision, you know exactly what are the possible values. One second contains 1 000 000 microseconds. An interval of time T plus-minus 1 min contains exactly 120 000 000 microseconds = 120 000 000 of exactly known values. Generating of hashes for each of them is deterministic and trivial. – mentallurg Jun 22 '20 at 20:58
  • 1
    @BrianBrownton: For instance, if the attacker plans to launch attack at 5:00:00, following hashes will be calculated: hash(4:59:59.000000), hash(4:59:59.000001), hash(4:59:59.000002), hash(4:59:59.000003), ...,hash(5:00:00.000000), hash(5:00:00.000001), hash(5:00:00.000002), hash(5:00:00.000003), ..., hash(5:00:00.999998), hash(5:00:00.999999), hash(5:00:01.000000). Around 5:00:00 the attacker requests and receives an email for password reset and just looks up the hash. It is easy, because the attacker stores pairs as a dictionary or as a map: (timestamp1, hash1), (timestamp2, hash2), ... – mentallurg Jun 22 '20 at 21:06
  • I understand your methodology, I'm just saying it is not *just* an MD5 (kerckhoff not withstanding) so they would be generating incorrect hashes. Further, this is an over the wire attack so they can't just try 120k hashes. They can try maybe a few a second, but not at a sustained rate or we would know something is up. – Brian Brownton Jun 23 '20 at 13:12
  • To "so they can't just try 120k hashes". Not 120K, but 120M :) They don't need all of them. May be I explained not good. They send 5 reset requests for the own user, then 1 request for the target user, then again 5 requests for the own user. When they see that their 5 requests correspond to timestamps from 05:00:00.020 to 05:00:00.040 and the next 5 correspond to 05:00:00.050 to 05:00:00.070, then very likely that the timestamp in the middle - for the target user - was 05:00:00.045. – mentallurg Jun 23 '20 at 16:06
  • ... If no lick with 05:00:00.045, they try tokens for timestamps with smaller step, 0.001, between 05:00:00.040 and 05:00:00.050. So they will not need 120M requests. a few requests may be sufficient. – mentallurg Jun 23 '20 at 16:09
  • I simplify. May be delays between different tokens are not equal. Or may be the requests are executed asynchronously and the 6th request (with the victim user) was processed bot between the 5th and 7th, but between 3th and 4th. But you get the idea. The time of the user token can be narrowed with a high precision. And this can increase their chances greatly. *May be* 2-3 requests will be sufficient. – mentallurg Jun 23 '20 at 16:14
  • To "we would know something is up" - sure. See what I wrote about it: "If you lock the user after 3-5 failed password reset attempts..." But the problem is, that because they know the possible interval of a victim timestamp, it *might be* that just 2-3 requests will be sufficient. So your system will not be even alerted yet. – mentallurg Jun 23 '20 at 16:18
  • Sure, here there are some simplifications. I just wanted to show that it may be not as safe is it looks at first glance and even simple attack may be successful. And of course evaluate the risk: Does it make sense for an attacker to break your system or not? Does it makes sense to invest resources or not? Will the possible success give anything valuable or not? – mentallurg Jun 23 '20 at 16:22
  • Perhaps there is ambiguity in the question ("secure ENOUGH") but to me, the steps you have laid out here would fall under secure enough. Getting lucky guessing microtime hashes, assuming the attacker somehow figured out the hash algo (Kerckhoff/obscurity or not), TO ME would still be considered secure enough considering this is only one layer in the security system. – Brian Brownton Jun 23 '20 at 17:11
  • If compromising someones account an attacker doesn't get access to millions of USD, then this approach *may be* sufficient. You decide. – mentallurg Jun 23 '20 at 17:15