2

I'm finding myself on the inexperienced end of a discussion about where password hashing takes place. My coworker drew a schema from less to more secure, where the Public Internet was the least secure, and an internal API for user data was the most secure. The internal API can only be reached from within the network. The web-server + code was in the middle of the two layers. His statement was that passwords should be hashed and/or encrypted at the API level and not on the web-server level, as it's inherently insecure to do so.

Over the last years I have always found the hashing algorithm, keys and salts on webserver level, within the code. He, however, has experience with military grade systems and big enterprise solutions, while I have worked more with smaller web applications, and have fewer years of experience to rely on.

Where should password hashing take place? Is it best to forward plain-text passwords all along the chain until it reaches the most secure level, or can it be hashed at webserver level without creating security loopholes?

Spork
  • 255
  • 4
  • 11

2 Answers2

5

"Layers" are an appropriate way of thinking when your job is digging up old dinosaur bones. This is not so for information security.

It is tempting to think of "defence perimeters", "demilitarized zones" and "defence layers" because it evokes analogies with bunkers, artillery, heroic charges, and other colourful expressions of the inherent abilities of humans at exercising creative violence. However, all these metaphors rely on a conception of attackers as if they were commandos, stealthy and deadly but constrained by laws of physics, in particular the need to be at one single place at any one time. These laws don't apply directly to the abstract world maintained by computers, and there the analogy breaks down. Thinking in terms of layers thus runs the risk of completely misunderstanding the situation.

Moreover, there is no such thing as a single, comprehensive measure of "security": claiming that some part of a system is "more secure" than another means that you implicitly put figures over the concept of security for both systems, along a linear scale which further more happens to be the same for both systems. Such claims are common but verge on the preposterous.

What you must think about is threats and vulnerabilities. In your case, an external client connects to your "network", and authentication must take place by presenting a password. Somewhere in this system, the password must be verified against some stored value. It is unavoidable that such a stored value allows for offline dictionary attacks (meaning: if the attacker finds the hash of the password, then he can "try passwords at home", as fast as his own computers allow for it). It so happens that the connection from client C first goes through a "Web server" W (which is reachable from the Internet) and then to an inner server S where the sensitive data/service actually resides. The questions are then:

  • When the password is entered by the user, should it travel to S, or stop at W ?
  • When hashing occurs, should it be done on W or on S ?
  • The password verification token (the hashed password) should be stored on W or on S ?
  • What are the communication mechanisms which should be employed between C and W, and between W and S ?

The ultimate goal of the attacker is to gain access to the services for which authentication is requested. If he gets the password itself, he wins. If he obtains the hashed password, then he wins or semi-wins, depending on the system details ("semi-win" meaning that the attacker does not obtain immediate access, but can run an offline dictionary attack, which usually succeeds after some efforts because average human users are bad at choosing strong passwords).

The attacker may spy on the line between C and W. To avoid that, use SSL. In fact, W being a Web server, it may be surmised that C runs a Web browser, which implies that the user password necessarily travels "as is" to W, under the protection of HTTPS. However, W then gets the cleartext password. Correspondingly, if the attacker can hijack W completely, then he wins. From this point of view, overall security cannot be "higher" (if you want to think in such terms) than the security of W. Therefore, in a layer-based analysis, if the Web server W is not in the "most secure layer" then you are doing it wrong. This alone shows the shortcomings of the way your colleague thinks about security.

Let's assume that the attacker cannot hijack W. Then you have the choice between three main systems:

  1. The password is hashed on W. W stores the hashed passwords, and performs the verification. No password hashed or not hashed reaches S.

  2. The password is hashed on W. The hashed password is sent to S. S stores the hashed passwords.

  3. The password is sent "as is" to S. W stores nothing. S does the hashing, and stores the hashed passwords.

In the three cases, you need some protection for the connections between W and S. Indeed, this whole question is moot if the attacker is not assumed to be able to enter the local network in some way. So we assume that the attacker has access to a third machine on the local network, from which he can spy and try to do extra connections.

  • In the first case, S trusts W, so S must make sure that it talks to the "true W", and the communications must be free from hostile hijack by the attacker.
  • In the second case, the password hash sent by W is "password equivalent", in that showing it to S grants access. It is thus a very valuable target, that the attacker will want to see.
  • In the third case, the unhashed password goes from W to S, and the attacker would be very interested in seeing it.

Therefore, SSL between W and S.

Each of the three systems has pros and cons:

  1. If W does the storage then W contains the hashed passwords, which may be recovered en masse from a partial read-only breach on W, as regularly happens with Web servers (SQL injection attacks...).

  2. If the hashing is done on W but the storage on S, then the salt value for the hashing must be first retrieved from S and sent to W, increasing protocol complexity and latency.

  3. If the hashing is done on S then S pays for the bulk of the hashing cost. We want the hashing to be expensive (see this answer); if S is busy (because it operates the actual service), then this limits the number of iterations that can be applied to hashing, consequently lowering security (the hashed passwords will be less robust against offline dictionary attacks, and the server S, being more loaded, will be weaker against DoS attacks).

Which choice is "the right one" depends on operational details of your specific situation: amount of free CPU, inner network performance, backup procedures, estimated average and peak loads... Moreover, the actual security differences rely on predictions of "how much" the Web server and the inner network are breached into; this is not a solid foundation for reasoning, and it would be bold to claim that the resultant risk analysis is more accurate than what scapulomancy may offer.

Trying to apply the "layers mindset" here looks like a knee-jerk dogmatic reaction.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • 1
    Thank you for your extremely detailed response, very helpful indeed! We do have SSL connections set up between C and W, and between W and S. There are also different entities in the network around W that could (theoretically) invade on the connection, so SSL is necessary, as you say. Thanks for giving an analysis for which solution is useful in which case. – Spork Jan 30 '14 at 15:09
2

I agree with your colleague; it is best to do it at the internal API level.

I presume he is proposing a three-tier architecture: database <-> app server <-> web server ? That is the norm for the kind of environments you mention he has worked in.

If you've worked on smaller apps you probably used a two-tier architecture: database <-> web server. And you probably tried to keep logic out of the database, so the web tier is the only place to do hashing.

The change in mindset for you is from two-tier to three-tier architectures.

paj28
  • 32,736
  • 8
  • 92
  • 130
  • You are right in me having worked on database <-> webserver applications, meaning all logic was outside of the database (theoretically one could use the hashing functions in MySQL, but that never looked solid to me). However, the setup we have no is Client -> Webserver <-> API <-> database... So we will be forwarding the password in plaintext from Client to Web and Web to API. – Spork Jan 30 '14 at 15:13
  • @Spork - I think by "API" he means "App Server". You should use the standard name for this, which is "App Server". When you say plaintext, do you actually mean that (i.e. no SSL) or do you mean that the password is encrypted but not hashed? Again, it is important to use the normal terminology to avoid confusion. – paj28 Jan 30 '14 at 15:37
  • Sorry about that. I mean no hash nor encryption other than SSL specifically on the passwords. With API, in this case, what is meant is a 3rd party application interface that we interact with to retrieve information from and push information to. – Spork Jan 30 '14 at 15:58
  • @Spork - In that case, I'd say your setup is fine. Just don't describe SSL this as plaintext. Your original question was about an "internal API" is this now a different API you're talking about? – paj28 Jan 30 '14 at 16:01
  • Sorry about the confusion, it stems from not being quite sure what to call it. It's a 3rd party API on our client's network that we get data from. So it's internal and... external. Sorry. – Spork Jan 30 '14 at 18:07
  • @Spork - Whether the API is the best place to do the hashing depends on its function. If it's a transactional application, it does. Just a data store - maybe not. At this point, giving you precise guidance is beyond the scope of this site, but I would tend to agree with your colleague. – paj28 Jan 30 '14 at 19:06