I am currently building a web service at http://write-math.com similar to http://detexify.kirelabs.org/ that should help users to get LaTeX code from drawn formulae. It is part of my bachelors thesis and a main goal of this project is to make it easier to do research in the field of on-line handwriting recognition. That means I want to share all data I get from users.
The easiest way to do so would be to simply dump the database. This way I could do my back-up copy and a dump for researchers in one step.
There are only two pieces where I hesitate to share it with the public as soon as other users use my system: Email addresses and passwords.
Passwords
The password is stored hashed and salted (that means I store md5($userpass.$salt)
and $salt
which is an 8 character random string with characters from A-Za-z0-9
- the salt is generated for each user). Is that enough so that it would be ok to make this public?
The main part of the question is about the Email address: At the moment, I store it as plain text. But I am thinking about storing a hash of the Email address only. This hash could not be salted, because my login function works as follows:
The user enters $email
and $password
. Both get sent as plain text to the server. Then the server does (as pseudocode):
$pwdb, $salt = query(SELECT password, salt FROM users WHERE email = :email)
if (md5($password.$salt) == $pwdb) {
Logged in
} else {
Wrong password
}
Email addresses
It does not matter if :email
is $email
or md5($email)
or md5($email.$applicationwide_random_str)
. But I can't make a new salt for each user without having to go through each user (which would probably be not too bad when I think I will never have more than 10,000 users).
Questions
- How long would it take to "unhash" one Email (e.g.
info@martin-thoma.de
ormexplex@gmail.com
) which has a random salt of 8 characters attached (e.g.FHCJ81ru
) with "standard" hardware (< $1000) when you don't know the random string? Is it a matter of seconds, minutes, hours or days? - Is it bad if people can do that? I mean they could also simply send Emails and look what they get back. In my service, there is not much personal data involved:
- handwritten symbols and formulae
- eventually handedness
- eventually when / where the person learned writing
- eventually the language of the user
- Why does no service hash the Email address (ok, I don't know if there are no services that do so, but I have never read that - hashing passwords is common, but hashing Email addresses? Never heard that.)
- Is it a good idea to hash Emails if you want to use the Email only if the user has lost his password and to sign in? (I though about using OpenID, but most people don't know what it is)