24

I am planning to develop a website that require that the users register a username and a password. When I let the user choose a password, what chars should I allow the users to have in the password? is there any that I shouldn't because of security issues with the http protocol or implementation language?

I haven't decided for a implementation language yet but I will use Linux.

Anders
  • 64,406
  • 24
  • 178
  • 215
Jonas
  • 5,063
  • 7
  • 32
  • 35

8 Answers8

43

From a security/implementation perspective, there shouldn't be any need to disallow characters apart from '\0' (which is hard to type anyway). The more characters you bar, the smaller the total phase space of possible passwords and therefore the quicker it is to brute-force passwords. Of course, most password-guessing actually uses dictionary words rather than systematic searches of the input domain...

From a usability perspective, however, some characters are not typed the same way on different machines. As an example, I have two different computers here where shift-3 produces # on one and £ on the other. When I type a password in, both appear as '*' so I don't know whether I got it right or not. Some people think that could confuse people enough to start disallowing those characters. I don't think it's worth doing. Most real people access real services from one or maybe two computers, and don't tend to put many extended characters in their passwords.

  • @GrahamLee - was mid-answer and you covered off everything I was going to say. Have a +1 :-) – Rory Alsop Feb 10 '11 at 17:22
  • http://xkcd.com/327/ – symcbean Feb 11 '11 at 13:41
  • 1
    @symcbean: "there shouldn't be", not "there's no chance you'll get it wrong" ;-) –  Feb 11 '11 at 16:29
  • 9
    Maybe it is useful to have a warning like *you have characters in you password which are not at the same place on every keyboard. Are you sure you want to proceed?* But when viewed strictly, this only leaves about 20 keys which are same on most keyboards (not counting Dvorak and the like). – Paŭlo Ebermann Feb 12 '11 at 02:10
  • Also, make sure that you are escaping `'` when storing it in a SQL database! – Earlz Feb 12 '11 at 02:45
  • 12
    @Earlz This question is about **password** if you're storing it in plain text, you're doing it wrong. – HoLyVieR Feb 13 '11 at 15:36
  • @Holy oh wow. Yea, disregard that, I was dumb for a second lol – Earlz Feb 13 '11 at 19:06
  • 1
    Why would you even disallow `\0`? – Stephen Touset Feb 03 '13 at 04:24
  • 4
    Because it's more likely to be handled incorrectly by the underlying implementation. You don't want to start truncating passwords just because somewhere in the bowels of your runtime they get converted to C strings. –  Feb 03 '13 at 08:28
  • If the user is using a password manager (as they definitely should), the issue of inputting the password should be moot... – John Dvorak Mar 15 '19 at 12:37
18

There can be issues with non-ASCII characters. A password is a sequence of glyphs, but the password processing (hashing) will need a sequence of bits, so there must be a deterministic way to transform glyphs into bits. This is the whole murky swamp of code pages. Even if you stick to Unicode, there is trouble afoot:

  • A single character can have several decompositions as code points. For instance, the "é" character (which is very frequent in French) can be encoded as either a single code point U+00E9, or as the sequence U+0065 U+0301; both sequences are meant to be equivalent. Whether you get one or the other depends on the conventions used by the input device.

  • A Unicode string is a sequence of code points (which are integers in the 0 to 1114110 range). There are several standard encodings for converting such a sequence into bytes; the most common will be UTF-8, UTF-16 (big-endian), UTF-16 (little-endian), UTF-32 (big-endian) and UTF-32 (little-endian). Any of these may or may not start with a BOM.

Therefore a single "é" can be meaningfully encoded into bytes with at least twenty distinct variants, and that's when sticking to "mainstream Unicode". Latin-1 encoding, or its Microsoft counterpart, is also widespread, so make that 21. Which encoding a given piece of software will use may depend upon a lot of factors, including the locale. It is bothersome when the user cannot log on his computer anymore because he switched the configuration from "Canadian - English" to "Canadian - French".

Experimentally, most problems of that kind are avoided by restricting passwords to the range of printable ASCII characters (those with codes ranging from 32 to 126 -- personally I would avoid space, so make that 33 to 126) and enforcing mono-byte encoding (no BOM, one character becomes one byte). Since passwords are meant to be typed on various keyboards with no visual feedback, the list of characters should be even more restricted for optimal usability (I daily battle with Canadian layouts where what is written on the keyboard does not necessarily match what the machine thinks it is, especially when going through one or two nested RDP connections; the '<', '>' and '\' characters are most often moving around). With just letters (uppercase and lowercase) and digits, you will be fine.

You could say that the user is responsible; he is free to use any characters he wishes as long as he deals with the problem of typing them. But that's not ultimately tenable: when users have trouble, they call your helpdesk, and you have to assume part of their mistakes.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 3
    Good post, but I would definitely not avoid the space, using sentences as a password is actually not a bad practise ( also see http://www.codinghorror.com/blog/2005/07/passwords-vs-pass-phrases.html, though obviously don't use famous quotes like in the example :) ) – Sebastiaan van den Broek Feb 02 '13 at 22:16
  • Why avoid spaces? Perhaps str.replace them out if you think it may cause compatibility problems anywhere, but being unable to make decent passphrases really bothers me. Nobody has spaces in passwords, which is exactly why I use them very frequently. – Luc Feb 08 '14 at 00:23
  • On a typical keyboard, the space bar makes a distinctive sound, making it an easy prey for shoulder surfers. – Thomas Pornin Feb 08 '14 at 12:16
  • One should probably say "if using Unicode, _then as always_, there is trouble afoot" instead of "even if you use Unicode". It is Unicode that creates trouble (as always!), not code pages. Entering the same password with the same codepage results in the exact same bits, reliably. It is only Unicode that screws things up. Entering the same PW using a different code page will produce different bits. This is actually a tiny bit of _extra security_ (2-3 bits to guess the codepage) for a dictionary attack. It also makes using a snooped PW harder (possibly running into login fail limit trying CPs). – Damon Feb 09 '14 at 11:18
12

If you are generating random passwords, it's a good idea to avoid characters that can be confused for others. For example (ignoring symbols):

  • Lowercase: l, o
  • Uppercase: I, O
  • Numbers: 1, 0
Justin Morgan
  • 436
  • 2
  • 6
6

In addition to allowing all characters, consider having a very generous max length on the password field to support people who take the passphrase approach to passwords.

The phrase "my password is all in lowercase" is actually a reasonable strong passphrase due to its length.

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
Antony
  • 161
  • 3
2

You should not disallow any characters. You may wish to prevent passwords from being shorter than 6 characters. And then you should use bcrypt to hash the password.

Stephen Touset
  • 5,736
  • 1
  • 23
  • 38
1

There are a couple of characters that may cause issues:

*, ? and %: As these are often used as wildcards they may confuse the underlying programming language.

Tab, Return, NewLine, Vertical Tab, Escape: Such special characters can solicit weird behavior from your programming language OR from the browser used by the customer. (If the customer uses several different browsers it is quite possible that one will allow these to be entered and another browser not. Effectively locking the customer out of his account on that browser.)

\ is often treated as an escape character that gives the character that follows special meaning.
E.g. "\n" is newline in many cases. "\t" is tab.
If your programming language (or the customers browser) does this you are back to the possibility of receiving the characters I mentioned above.
So it is probably best to dis-allow \ altogether just to be safe.

Tonny
  • 148
  • 6
  • 9
    I'm sorry for the downvote, but there is a reason for it: issues in the underlying programming language are *never* a reason to prohibit characters from being used. This is what mysql_real_escape_string, or better, parameterized queries are for. User data should never ever ever be interpreted as being executable code regardless, if this happens you'll have much bigger problems than just password storage. Asterisks, question marks, percentage signs and backslashes are perfectly fine characters that I use and want to continue using in my passwords. Besides, didn't we hash them before storage? – Luc Feb 08 '14 at 00:43
  • @Luc Using `\0` inside a string risks silent truncation in C and its derivatives, so forbidding it seems reasonable. `\r`, `\n` and `\t` suffer from input issues. For good measure I'd extend that to all control characters (ASCII code <32) – CodesInChaos May 05 '16 at 16:15
  • @CodesInChaos Control characters yes, those have no visual representation so do not occur in normal passwords. The literal `\0` (so ASCII 92 and ASCII 48, not ASCII 0) should also be perfectly fine since it shouldn't be interpreted but used literally, but I think you mean ASCII 0, in which case it's a control character and I'd agree. – Luc May 05 '16 at 19:48
0

I think that unless a 'virtual keyboard' or a similar tool is available, that would produced characters in uniform way, we have alphanumeric characters only. The location of all the rest can differ on different keyboards. If a user should access the service from another location, that could lead to efficiently locking them out of service.

I would suggest using virtual keyboard as a way to send exactly the same character representations (it was said about Unicode above already) in the same manner no matter what system/keyboard/whatever is used. Thus there will be no need to exclude any character that could be typed on any keyword.

-6

If you allow upper and lower case alphanumerics and set the minimum password length to eight characters you should be OK. Allowing other characters raises issues with different keyboards.

OhBrian
  • 59
  • 1
  • 4
    If someone implemented this I would not use their site as none of my passwords would work. – Chris Dale Feb 14 '11 at 15:46
  • 3
    This is highly insecure and easily crackable. If you limit yourself to upp and lower characters you minimum length should be 17. – this.josh Jul 29 '11 at 20:02
  • @this.josh Minimum length of 17? Alright here's an md5sum of an 8-character letters-only password: `124c6ffa6d57c5909e7a403293aed173`. Generated using `echo -n secret | md5sum`. Since this is less than the square root of the strength you said is the "minimum" more than 2 years ago, I expect it must be no problem to crack on a commodity gpu (using hashcat or barswf or something). Good luck. (Honestly I think it's doable, but md5 shouldn't be used for password storage anyway. Still, I wonder if anyone can figure it out.) – Luc Feb 08 '14 at 00:37
  • @Luc for password in {{A..Z},{a..z}}{{A..Z},{a..z}}{{A..Z},{a..z}}{{A..Z},{a..z}}{{A..Z},{a..z}}{{A..Z},{a..z}}{{A..Z},{a..z}}{{A..Z},{a..z}}; do echo -n $password | md5sum>>rainbow; echo $password >>rainbow; done; echo "8-character letter-only rainbow tables for md5, simply grep for md5 and get password, no gpu required" – this.josh May 21 '14 at 06:29
  • @this.josh do it, then. – Luc May 21 '14 at 15:53
  • 1
    @Luc `pzNraRqZ` (I know it's almost 3 years later) – Kade Jan 24 '17 at 18:33
  • @Kade Nice. Might I ask how much effort that took? Like, what are the system requirements and and how long did it take? – Luc Jan 25 '17 at 10:51