Secure Implementation of Password Database

Question

Disclaimer I am not a security professional, just a programmer trying to do my best. Additionally, this is my first post to this community so I appologise if this question is too broad.

The Situation My local network (managed and secured by a professional IT team) has many devices on it running simple web servers that serve pages used to control data loggers, PID controllers, etc. These web pages currently have no form of authentication at all, which will be a big problem if a bad-actor ever does compromise the network.

My Goal is to add authentication to these web pages by adding a login page to the app the server is running. (Possibly relevant, I am using Python to run the server.)

My Implementation I have a proof-of-concept design in which I do the following:

The user is prompted for username and password on registration page
The server hashes the password using SHA256 to create hashes of equal-length
A random salt is generated using a reasonable work-factor.
The hashed password is hashed again, this time with the salt. (The salt generation and hash uses python's bcrypt)
The username, work-factor, salt, and hash are stored in the password database.
When a user attempts to log in, the entered password is hashed in the same way and compared to the record in the database. (The salt used for comparison is fetched from the password database using the entered username as the key.)
If the hashes match, a cookie is given to the user which grants them access to the restricted web pages.

(Note that all communication between the web page and server are using POST requests.)

Questions and Concerns:

Saving the hash, work-factor, and salt in the same database seems like bad practice.
Does this method of hashing (reasonably) ensure that no one can determine a users password, including myself?
Everything currently uses http. Is https necessary if all access is within the local network?
Is using a cookie a good way to validate that a user has logged in successfully?

"A random salt is generated using a reasonable work-factor." What does this mean? -- I mean, I know all of those words, but that's like saying you only use synthetic oil in your car's wiper fluid reservoir. — Ghedipunk, Dec 12 '19 at 16:35
Not exactly sure what the confusion is. Perhaps work-factor has a different meaning in a different context? The bcrypt documentation talks about how it uses work-factor in the salt generation https://pypi.org/project/bcrypt/ — Rekamanon, Dec 12 '19 at 17:06
I see the confusion, now. Those function calls are grossly misnamed, and that library is combining two separate parameters into one object, and calling that resulting object a salt. (Extending the car maintenance analogy, your mechanic is "filling the fluids" and calling all fluids oil, even the fluids that obviously aren't meant for lubrication.) I'm already typing up an answer; I'll be certain to cover the concepts and jargon. — Ghedipunk, Dec 12 '19 at 17:13
Does this answer your question? [How to securely hash passwords?](https://security.stackexchange.com/questions/211/how-to-securely-hash-passwords) — Conor Mancone, Dec 12 '19 at 19:30
You say your network is "managed and secured by a professional IT team", have you thought about using a different authentication system for users? Depending on what your company does you may be able to use something like Active Directory to have users login without needing to store any password information on your own system. Then you'd just need an ACL to give authorized users permissions to different pages — GammaGames, Dec 12 '19 at 23:48
I highly recommend using `PyJWT` for your cookie. Store user id, their roles and all other data you usually need from the user in that token to run to the database every time user queries your server. — data, Dec 13 '19 at 10:38
@data I wouldn't unilaterally suggest JWTs. A JWT has a very specific purpose: to store and verify data without having to check a data store. They come with their own baggage and hiccups, so if you don't need their particular use case, then a simple session id is preferable. — Conor Mancone, Dec 13 '19 at 13:04
@ConorMancone In part, yes, but it lacks any comment about http/https and cookies for validation. Additionally, it is quite an old answer (in terms of internet years) and I don't know enough to tease apart what was then good security practice but is now not, so I was hoping for fresh advice. — Rekamanon, Dec 13 '19 at 13:25
@GammaGames Isn't that still at the network level? Also, all the web-servers are on Linux machines, while the computers people use to access them are Windows (if that matters). — Rekamanon, Dec 13 '19 at 13:40
@Rekamanon I would expect the servers to be linux, and using AD would let users sign in with the same account they use for their windows computer or outlook email. It is on the network level but it allows you to rely on MS's password storage and only means you have to set up authenticating [with their servers](https://stackoverflow.com/questions/41684463/flask-active-directory-authentication). — GammaGames, Dec 13 '19 at 15:30
Rule #0 is always _don't invent it yourself when there are libraries or applications to do it for you_. — chrylis -cautiouslyoptimistic-, Dec 13 '19 at 20:36
*" I am not a security professional"* ... So you shouldn't be inventing your own login system. There are proven, off-the-shelf solutions to this problem in most web-frameworks. It would be dangerously foolhardy to use your handrolled system in preference to one of these. Added bonus: Now you've got a stack of time to do something more productive. — spender, Dec 14 '19 at 03:52
@ConorMancone in what cases you think a plain session ID would be better than a JWT? — data, Dec 14 '19 at 21:36
@data in any case when you are going to be checking the database anyway — Conor Mancone, Dec 14 '19 at 22:53
A usual suspect for password hashing in python is passlib https://pypi.org/project/passlib/ it offers a lot of the sane choices you might want. — schlenk, Dec 14 '19 at 23:11

Ghedipunk · Accepted Answer · 2019-12-13T19:56:59.273

The current general best practices for authentication are in the NIST SP 800-63-3 Digital Identity Guidelines standards, especially in SP-63B Authentication and Lifecycle Management.

These NIST standards are an easy read for developers, and besides telling you what to do, it also talks about why you want to do certain things. (If you want more details than the NIST standards give, we're happy to help.)

That said, let's look over your current authentication system and address your concerns:

The user is prompted for username and password on registration page

The registration page?

This is probably mistyped and you meant a login page, but if there is a registration page that anyone on your network can access, then you need to explicitly define a seperate authorization system. For example, anyone can register for an account on any online store, but this account doesn't give you access to the part of their site that lets admins change the prices on their products.

The server hashes the password using SHA256 to create hashes of equal-length

This step doesn't add any security, or improve system performance in any meaningful way, especially since you're using bcrypt, which will discard any part of the password past the first 72 characters anyways.

It is unlikely, but this step could reduce entropy. The reduction in entropy is insignificant in the grand scheme of things, and requires users to already be using long and randomly generated passwords, but since this is an extra step that doesn't improve security, I suggest leaving it out.

A random salt is generated using a reasonable work-factor.

I'll admit that I was very confused seeing this step at first. If this were about car maintenance, it would make as much sense to me as "Added synthetic oil with a reasonable octane rating." However, in the comments it was revealed that the specific bcrypt library being used is the bcrypt library hosted by PyPI.

I can't find the source code (and am allergic to languages that use whitespace to delineate scope), but based on the documentation, it appears that the library's function call to generate the bcrypt's parameters is named bcrypt.gensalt(workfactor), and this method itself takes a work factor as its parameter... Extending the car maintenance metaphor, it would be as if there were a function named vehicle.refuel(viscosity), or vehicle.changeoil(octane).

A salt, in authentication and cryptography jargon, is a random value that is added to a plaintext, to make it infeasible to pre-calculate the output of a cryptographic hash function or key derivation function. The salt itself is not a secret; its only strength is that it's not known beforehand.

A salt's value is measured in how long it is, also called its entropy. (Entropy in cryptography is a nuanced topic, that depends on more than just length, but for any reasonable library that creates a salt, the longer the salt, the more entropy it has.)

A work-factor, on the other hand, is jargon that is specific to key-stretching algorithms, such as this bcrypt KDF. The work factor defines how much work the CPU (or GPU, FPGA, or ASIC when you're the attacker) needs to do, and its main feature is that it takes longer to calculate the output, using steps that can't be bypassed, guessed, or ignored. This may seem counterintuitive; you want your application to run as fast as possible, right?

Well, the general risk is that your database of passwords will be leaked. Top retailers leak passwords all the time, and you don't have the millions of dollars to spend securing applications like top retailers do, so assume that your database of passwords might eventually be leaked, too.

The tradeoff here is, when logging in, your user will have to wait an extra second. When an attacker is cracking your passwords, they can only guess one password per second, per CPU that they put to the task. (Motivated attackers can take some shortcuts, but even 10 guesses per second is MUCH better than billions of guesses per second if you're just using single round SHA256 instead of a key-stretching algorithm.)

So, the salt and the work factor are both important parameters that have nothing to do with each other. I'm sorry that your first exposure to the concept was through a very poorly named function.

The hashed password is hashed again, this time with the salt. (The salt generation and hash uses python's bcrypt)

Minor nitpick, but this is because I'm being thorough: The results of bcrypt aren't a hash, even though many people call it a hash, and everyone here will know what you're talking about if you continue to call it a hash. The results are a stretched password or a derived key.

The username, work-factor, salt, and hash are stored in the password database.

If the result looks something like this, then you already have the work factor, salt, and derived key all together:

$2a$08$0SN/h83Gt1jZMR6924.Kd.HaK3MyTDt/W8FCjUOtbY3Pmres5rsma

The 2a is the algorithm (bcrypt with unicode support).

The 08 is the work factor.

The next 22 characters, 0SN/h83Gt1jZMR6924.Kd., are the salt.

And the rest is the derived key.

If you're getting something different from your bcrypt library, find a different library.

When a user attempts to log in, the entered password is hashed in the same way and compared to the record in the database. (The salt used for comparison is fetched from the password database using the entered username as the key.)

Great. However, the library that you're using is already doing the heavy lifting for you, so use your library to its fullest. The bcrypt.checkpw function will pull out the work factor, salt, and derived key for you. It will then run its KDF and compare the results.

Using this library function means that, if you decide to change the default work factor in the future, you don't have to have separate code to handle the older derived keys, as bcrypt.checkpw will be able to figure out the desired work factor on its own.

If the hashes match, a cookie is given to the user which grants them access to the restricted web pages.

This sounds great on the face of it... but make sure that the cookie itself doesn't contain the authorization. It should contain a randomly generated string (session ID) and when accessing the site, the server should look at that session ID and check its own resources to see if that session ID is properly authorized. Clean up the session IDs after a while, too.

If your cookie contains "is_authorized=true" instead of a session ID where the server double checks its own resources, then people can just make their own authorization cookies and will never need to authenticate.

Concerns:

Saving the hash, work-factor, and salt in the same database seems like bad practice.

The salt and work factors are not sensitive in any way, but are necessary to validate a password. At best, any attempt to hide the salt will result in security-through-obscurity, which isn't any sort of security at all. You might as well store the work factor and salt right alongside the derived key, to save yourself the headache, because splitting them up aren't going to prevent an attacker from getting to them.

Does this method of hashing (reasonably) ensure that no one can determine a users password, including myself?

As far as your responsibility as a web developer? Yes.

And thank you for asking if it reasonably ensures that no one can determine a password. They can still be guessed in offline attacks, but using a key stretching algorithm such as bcrypt will greatly hinder attackers, even in the worst-case scenario of an offline attack against a leaked database.

The only thing you can do better, is to add 2fa and demand that your users use password managers with 24+ character long, truly random passwords that are unique for every site.

Note that password resets should not depend only on this second factor. As per the NIST guidelines, it should depend on the same identifying factors used in setting up the account in the first place (and make as much noise as possible on as many user communication channels as your application knows about) and utilize at least one additional factor, if available.

For most web applications, this means an email to the account used during account registration (NOT text messages to a cell phone, or security questions by themselves, though these can be additional factors after the email is sent). For an internal business application, this means the user has to make a call to the sysops team/helpdesk (or, more realistically, they need to stop by the developer's desk).

Everything currently uses http. Is https necessary if all access is within the local network?

You have to trust your local network to some extent. However, it never hurts to add HTTPS. If you're concerned at all, add HTTPS. (The fact that you asked this questions means that, yes, you are concerned. So yes, you should add HTTPS.)

Adding HTTPS is a good idea for non-security reasons as well, as web servers (and browsers) will use HTTP/2, if available, only over HTTPS connections. This can speed up your site noticeably. It doesn't apply in your case, as you're using a Python-script based web server, which is unlikely to have HTTP/2 capabilities, but it's a good habit to get in for reasons beyond "just" security. (Security is more than enough justification in my opinion, but there are managers who need additional convincing.)

Is using a cookie a good way to validate that a user has logged in successfully?

If that cookie contains a session ID rather than authentication information, that session ID is random, and the server verifies the session ID, then yes. This is standard practice for the vast majority of websites with any sort of user identification.

In regards to HTTPS on the local network, Google switched their internal network to encrypted-everything after they found the NSA was listening in. — Mark, Dec 13 '19 at 00:38
I am not OP, but thank you very much for this thorough answer. It was incredibly useful. — Pedro A, Dec 13 '19 at 02:08
Just note that most implementation of 2FA is a **known security flaw**. Most 2FA is treated as 1FA (weakest link) when doing password recovery. How many sites do you know, where you use "2FA" with your mobile, can you recover your account with just the mobile number? That's 1FA. It's terrible. — Nelson, Dec 13 '19 at 07:40
That session ID stuff is some PHP garbage. A JWT token is the way to do it these days, with a huge bonus of not needing a database query for every authenticated request. — data, Dec 13 '19 at 10:36
In some cases, HTTPS can make your site *faster*. Off the top of my head, IIS will only use HTTP/2 (which can be faster than 1) if you're using HTTPS. — Richard Ward, Dec 13 '19 at 12:01
@RichardWard: I think that's true both for most browsers and servers. — Jörg W Mittag, Dec 13 '19 at 16:09
@data, JWT may be the buzzword of the future, like Rust, nosql, and websockets, but is a junior developer going to understand it after you've won a lottery jackpot and left the company without a senior developer? Are they going to get it right? (More important to this little corner of SE: Will it be superseded by a new buzzword worthy standard in 3 years?) JWT is technically sound, but until it is ubiquitous and developers _can't_ get the details wrong because _every_ library takes that out of our hand, I'll stick to advising junior devs to use easily understood primitives. — Ghedipunk, Dec 13 '19 at 16:46
I've edited in suggestions from comments. Thank you everyone! — Ghedipunk, Dec 13 '19 at 16:50
@data Sessions as a concept are not limited to PHP, and [JWTs are not in general better](http://cryto.net/~joepie91/blog/2016/06/13/stop-using-jwt-for-sessions/). — Bergi, Dec 13 '19 at 18:18

score 10 · Answer 2 · edited Dec 12 '19 at 19:05

The salt isn't supposed to be secret. Its purpose is to be different for every password, so the hashed database can't be attacked with rainbow tables. The other parameters you mention aren't supposed to be secret either. So you can store all this information in the same database, which is how most applications do it. The database table that contain the hashes usually also contain the salts and all the other information about the hashing algorithm.

If you use a good method for hashing (use the appropriate library, don't try to implement a hashing algorithm yourself!), no one will be able to recover the passwords without bruteforcing them. That doesn't mean it will be impossible though! Weak passwords will remain weak passwords, so 12345 and admin will still be easy to guess of course.

HTTPS is important everywhere, everytime, for everybody. Attackers aren't only out there, far away, in foreign countries. An attacker might be your collegue, an ex-collegue, a visitor, somebody who breaks in, or even an infected router or any other device. So get rid of HTTP whenever you can. Hopefully one day HTTP will become extinct.

Cookies are ok for session management, but there are several details you need to be careful about. For example use HTTP-only cookies, make sure the session IDs contained in the cookies are truly random and impossible to guess, decide if you need to set an expiration time (should the user will be logged out after some time of inactivity?), etc.

WoJ · Answer 3 · 2019-12-13T12:14:06.600

5

I am not sure where you intend to place this login screen but I assume that you will rewrite part of the servers to add a login page. I do not really understand what is "the server" written in Python you are referring to.

Anyway, do not reinvent how to store the passwords. There are methods for that and since you mention Python, a good starting point may be the Django documentation on that subject.

This will also take care of how to keep the session (through a cookie for instance)

As a general overview about password storage, see this 2013 answer by Thomas Pornin. You may consider using Argon2 as the hash algorithm.

Finally, you may want to read OWASP positioning on that matter, which mentions that (emphasis mine)

As with most areas of cryptography, there are a many different factors that need to be considered, but happily, the majority of modern languages and frameworks provide built-in functionality to help store passwords, which handles much of the complexity.

edited Dec 13 '19 at 12:14

answered Dec 12 '19 at 16:31

WoJ

8,957
2
32
51

I updated the question to clarify that I am adding a login page to the application that the server is running. I'm using Python Bottle to handle the backend, using Bottle's "built-in default server is based on wsgiref WSGIServer". – Rekamanon Dec 12 '19 at 17:18
Also, not sure how this process reinvents the wheel? Most of what I have read suggests salting and hashing, including the 2013 answer you linked (which recommends bcrypt). I will look into Argon2 though... – Rekamanon Dec 12 '19 at 17:31
@Rekamanon, there is an update to that 2013 answer on a different question: https://security.stackexchange.com/questions/193351/in-2018-what-is-the-recommended-hash-to-store-passwords-bcrypt-scrypt-argon2 – Ghedipunk Dec 12 '19 at 18:45
1

@Rekamanon: it is great that you did due diligence in understanding how to correctly store passwords. What you do not want is to write the routines to do that yourself (mostly because so many things can go wrong that it is not worth it (except as a learning exercise, if interested)). There are libraries written for just that purpose you should use. They also bring many additional services, such as session management (your lat point), all that in a well thought and tested manneer. I will add an extra information regarding OWASP position on that subject. – WoJ Dec 13 '19 at 12:12
@WoJ Ah, I see. I'm trying to 're-implement' the wheel. I was aware that trying to implement home-brew hashing algorithms was a bad idea, but didn't realize the implementation of the algorithm itself could also create vulnerabilities. – Rekamanon Dec 13 '19 at 13:44
@Rekamanon yes, it can and had (I am talking form experience over many applications). The all-in-one libraries (the good ones at least) are definitely the way to go when starting with security-oriented development. You will eventually hit a corner case someday and the fact that you gave a deep thought (as you did) on the lower-level implementation will help. But until then I would definitely recommend to go with libraries which do as much as possible of the logistics. – WoJ Dec 13 '19 at 13:47

score 5 · Answer 4 · answered Dec 13 '19 at 18:05

This is a software engineer here. I will answer from that point of view.

My local network has many devices on it running simple web servers

You should really try to see if you can use SSO (Single Sign On).

Before telling some detail, SSO is basically not storing passwords in your app. You will rely on a third secure party, like Active Directory. After all, is it a company environment? Cool

If you want to go for this path you will have the main advantage that you don't have to manage or reinvent security. Handling passwords is not easy stuff and having to write code for an authentication system rather than integrating with an existing open source framework is a design smell.

Using a third party as authentication source relieves you from that problem and avoids users having too many passwords around. Again, there are libraries for various SSOs all around ~~the world~~ Github, so you won't have to rewrite from scratch.

Think that if you store password on every device's server, then there will be a copy of (the hash of) the password for each server, and they won't be synchronized should the user decide to change it or when they get reset. If you have 5 devices... or 1000 devices... tell me the experience!

The simplest AD-based SSO means that user types the password on the login screen but that password is never stored in the portal, it is sent to central domain controller for validation. I won't discuss permission management.

Active Directory is a mere example. There are other tools.

As for other points

Everything currently uses http. Is https necessary if all access is within the local network?

Everywhere a user types a password, https is mandatory. There are a few excellent reasons for that:

If people use that Active Directory password I mentioned, you are exposing it on the wire (and MS Windows does not when you log in to your computer). So you are entering a vulnerability in a system that is considered enterprise-grade secure
Normally people reuse passwords, so if someone catches the password they could try to reuse somewhere else

Even in small offices there could be a bad player trying to run Wireshark on the LAN and sniff a password, maybe their manager's password. To do what? Depends... Access payrolls?????

Is using a cookie a good way to validate that a user has logged in successfully?

Cookie is used for "remember me" function after user has authenticated. So they won't have to type the password again. For the scope of this answer, I'd say yes, it is a good way. But you MUST use https and protect your application from XSRF attacks. I won't discuss those here. Cookies are used by Google and Facebook in the correct way. If they use cookies, why shouldn't you?

Disclaimer: I don't get money my Microsoft. I'm just writing the simple-most example that comes into my mind

While the accepted answer does answer the exact question asked, this answer addresses the actual needs of the poster. Don't reinvent the wheel... passwords are tough to manage. Trust an existing identity provider to manage those for you. — Gabriel Bourgault, Dec 15 '19 at 21:07

Secure Implementation of Password Database

4 Answers4