XKCD #936: Short complex password, or long dictionary passphrase?

Question

How accurate is this XKCD comic from August 10, 2011?

I've always been an advocate of long rather than complex passwords, but most security people (at least the ones that I've talked to) are against me on that one. However, XKCD's analysis seems spot on to me.

Am I missing something or is this armchair analysis sound?

A practical note: I have used [Diceware](http://www.diceware.com) to help me select random words before. I found I could remember 5 words really easily. I did roll a couple of times to find a sequence that felt nice to say (in my head) though, without necessarily making sense. — Alex Bowe, Aug 10 '11 at 23:03
New most common passwords: `onetwothreefour` `passwordpasswordpasswordpassword` `teenagemutantninjaturtles` — Chris Burt-Brown, Aug 11 '11 at 08:35
I think commentors here have brought up all of these points already but, for the record, here's some elaboration by the comic's author, Randall Munroe: http://ask.metafilter.com/193052/Oh-Randall-you-do-confound-me-so#2779020 — Michael, Aug 11 '11 at 15:56
Makes you wonder why some banks limit your password to 6 or 7 characters. — Lotus Notes, Aug 11 '11 at 19:06
@LotusNotes Mine requires exactly FIVE! And every stupid forum requires a >8 chars, upper+lowercase+numbers+punctiation... — Tobias Kienzler, Aug 11 '11 at 20:28
Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/30763/discussion-on-question-by-billy-oneal-xkcd-936-short-complex-password-or-long). — Rory Alsop, Oct 27 '15 at 07:58
In an empirical test, passphrases don't seem to help as much as XKCD would have you believe: [dl.acm.org/citation.cfm?id=2335356.2335366](http://dl.acm.org/citation.cfm?id=2335356.2335366) — WBT, Nov 02 '15 at 04:23
@wbt Interesting. It's a shame the passwords they were comparing with were only 5-6 characters though. Would be interesting to see a similar comparison with actual security. — Billy ONeal, Nov 02 '15 at 04:26
After 5 more years, this is finally taken into account in the [official NIST authentication guidelines](https://nakedsecurity.sophos.com/2016/08/18/nists-new-password-rules-what-you-need-to-know/). — Dmitry Grigoryev, May 30 '17 at 13:22

score 1575 · Answer 1 · edited Jul 28 '20 at 08:02

1575

I think the most important part of this comic, even if it were to get the math wrong (which it didn't), is visually emphasizing that there are two equally important aspects to selecting a strong password (or actually, a password policy, in general):

Difficulty to guess
Difficulty to remember

Or, in other words:

The computer aspect
The human aspect

All too often, when discussing complex passwords, strong policies, expiration, etc (and, to generalize - all security), we tend to focus overly much on the computer aspects, and skip over the human aspects.

Especially when it comes to passwords, (and double especially for average users), the human aspect should often be the overriding concern.
For example, how often does strict password complexity policy enforced by IT (such as the one shown in the XKCD), result in the user writing down his password, and taping it to his screen? That is a direct result of focusing too much on the computer aspect, at the expense of the human aspect.

And I think that is the core message from the sage of XKCD - yes, Easy to Guess is bad, but Hard to Remember is equally so.
And that principle is a correct one. We should remember this more often, AKA AviD's Rule of Usability:

Security at the expense of usability comes at the expense of security.

edited Jul 28 '20 at 08:02

Robotnik

103
6

answered Aug 11 '11 at 09:14

AviD

72,138
22
136
218

751

Your last quote deserves a thousand upvotes. – Camilo Martin Mar 20 '12 at 04:07
42

For an in-depth analysis of the maths behind the xkcd, see [Thomas's answer](http://security.stackexchange.com/a/6096/33) below. His answer shows why the xkcd got the math right, a perfect complement to why it doesn't actually matter. – AviD Mar 16 '13 at 21:56
writing down a password is not that much of an issue. people robing you leave with cash, jewelry, light and fencable things (laptops, tablet...), but they don't care about a password on a post it on your desk. Especially since a/ they don't have the login and b/ they don't know what service it is related to. – njzk2 Jan 15 '16 at 20:52
4

@njzk2 It sounds a lot like a phrase that people who keep their passwords on a desk post-it would use to reassure themselves by fooling themselves into thinking they are safe. If my home gets robbed I would be much more concerned about the robbers having my banking password than about having lost my cash, jewelry etc; especially if I find out about the burglary only when I get home at the end of the day. The only remotely potentially tenable justification for keeping passwords on desk post-its is that they are passwords to something not important so it's no big deal if someone steals them. – SantiBailors Feb 09 '16 at 10:07
1

@SantiBailors having a random sequence of words on a piece of paper somewhere is far from giving you access to anything. could be a hint, could be a password, could be obsolete, could be a transformation of the password, and even then you don't know to which account it belongs, nor what is the username. – njzk2 Feb 09 '16 at 18:52
1

@njzk2 Yes, it could be many things, but the things that interest a thief are very few, so the thief will probably give f.ex. your internet banking some shots with that as the password. And other things they stole from your home are likely to give away what your bank is, from which it's easy to find out what kind of usernames that system uses, and if it doesn't use standard usernames (f.ex. mine uses my social security number) then the username is likely to be written on the same post-it. I'm just saying if I write down my passwords I wouldn't leave them on the desk. – SantiBailors Feb 10 '16 at 11:02
6

@SantiBailors agreed, it is not a good practice. But I think it is better for someone who would have difficulties in remembering a good password than choosing something trivial, like the name of their dog. – njzk2 Feb 10 '16 at 14:55
3

I would generalize "difficulty to remember" to "difficulty to use", which contains both "difficulty to remember" and "difficulty to enter". – Deduplicator Jul 15 '17 at 15:45
1

Great quote! Mind me using it in an internal company infosec page (with source url)? – johan vd Pluijm Nov 29 '17 at 11:38
2

@johanvdPluijm please do! I'd appreciate attribution, or call the rule by it's proper name :-) – AviD Nov 29 '17 at 11:58
1

@AviD: I also put it in my internship report about information security awareness: "Other key aspects were relevance and usability: “Security at the expense of usability comes at the expense of security” (AviD, 2011)." There you go. I did rectify the use of this source by the number of upvotes. Now you are part of a limited number of people I dare to quote in my internship report (+/- 30 atm) – johan vd Pluijm Nov 29 '17 at 13:18
14

Another thing that makes the battery staple method better (via AviD's observation on usability) is the increasing number of mobile devices. On a mobile keyboard, the 'leetspeak' method requires a lot of pecking and symbol table shifting to and fro, while the battery staple method can be typed in much more easily with less risk of error. Try timing yourself how quickly you can enter either sample password on an iPhone's screen keyboard. – Shadur Dec 02 '17 at 14:20
1

im not sure that this is clear to me. which side ate you on? – tuskiomi Dec 24 '18 at 05:47
1

One advantage for passwords that are complete gibberish: If somebody happens to glance at one on your screen, it's unmemorizable without quite a lot of work. – Kyralessa Jan 12 '19 at 08:44
1

For the records, AviD's rule of usability is similar to Roger G. Johnston's "I Hate You Maxim 2": _The more a given technology causes hassles or annoys security personnel, the less effective it will be_. You can find this and other smart security maxims in [this PDF](http://www-personal.umich.edu/~rsc/Security/security_maxims.pdf). – Enos D'Andrea Mar 07 '20 at 08:13

score 548 · Answer 2 · edited May 28 '18 at 13:38

Here is a thorough explanation of the mathematics in this comic:

The little boxes in the comic represent entropy in a logarithmic scale, i.e. "bits". Each box means one extra bit of entropy. Entropy is a measure of the average cost of hitting the right password in a brute force attack. We assume that the attacker knows the exact password generation method, including probability distributions for random choices in the method. An entropy of n bits means that, on average, the attacker will try 2^n-1 passwords before finding the right one. When the random choices are equiprobable, you have n bits of entropy when there are 2ⁿ possible passwords, which means that the attacker will, on average, try half of them. The definition with the average cost is more generic, in that it captures the cases where random choices taken during the password generation process (the one which usually occurs in the head of the human user) are not uniform. We'll see an example below.

The point of using "bits" is that they add up. If you have two password halves that you generate independently of each other, one with 10 bits of entropy and the other with 12 bits, then the total entropy is 22 bits. If we were to use a non-logarithmic scale, we would have to multiply: 2¹⁰ uniform choices for the first half and 2¹² uniform choices for the other half make up for 2¹⁰·2¹² = 2²² uniform choices. Additions are easier to convey graphically with little boxes, hence our using bits.

That being said, let's see the two methods described in the comic. We'll begin with the second one, which is easier to analyze.

The "correct horse" method

The password generation process for this method is: take a given (public) list of 2048 words (supposedly common words, easy to remember). Choose four random words in this list, uniformly and independently of each other: select one word at random, then select again a word at random (which could be the same as the first word), and so on for a third and then a fourth words. Concatenate all four words together, and voila! you have your password.

Each random word selection is worth 11 bits, because 2¹¹ = 2048, and, crucially, each word is selected uniformly (all 2048 words have the same probability of 1/2048 of being selected) and independently of the other words (you don't choose a word so that it matches or non-matches the previous words, and, in particular, you do not reject a word if it happens to be the same choice as a previous word). Since humans are not good at all at doing random choices in their head, we have to assume that the random word selection is done with a physical device (dice, coin flips, computers...).

The total entropy is then 44 bits, matching the 44 boxes in the comic.

The "troubador" method

For this one, the rules are more complex:

Select a random word in a given big list of meaningful words.
Decide randomly whether to capitalize the first letter, or not.
For the letters which are eligible to "traditional substitutions", apply or not apply the substitution (decide randomly for each letter). These traditional substitutions can be, for instance: "o" -> "0", "a" -> "4", "i" -> "!", "e" -> "3", "l" -> "1" (the rules give a publicly known exhaustive list).
Append a punctuation sign and a digit.

The random word is rated to 16 bits by the comic, meaning uniform selection in a list of 65536 words (or non-uniform in a longer list). There are more words than that in English, apparently about 228000, but some of them are very long or very short, others are so uncommon that people would not remember them at all. "16 bits" seem to be a plausible count.

Uppercasing or not uppercasing the first letter is, nominally, 1 bit of entropy (two choices). If the user makes that choice in his head, then this will be a balance between user's feeling of safety ("uppercase is obviously more secure !") and user's laziness ("lowercase is easier to type"). There again, "1 bit" is plausible.

"Traditional substitutions" are more complex because the number of eligible letters depends on the base word; here, three letters, hence 3 bits of entropy. Other words could have other counts, but it seems plausible that, on average, we'll find about 3 eligible letters. This depends on the list of "traditional substitutions", which are assumed to be a given convention.

For the extra punctuation sign and digit, the comic gives 1 bit for the choice of which comes first (the digit or the punctuation sign), then 4 bits for the sign and 3 bits for the digit. The count for digits deserves an explanation: this is because humans, when asked to choose a random digit, are not at all uniform; the digit "1" will have about 5 to 10 times more chances of being selected than "0". Among psychological factors, "0" has a bad connotation (void, dark, death), while "1" is viewed positively (winner, champion, top). In south China, "8" is very popular because the word for "eight" is pronounced the same way as the word for "luck"; and, similarly, "4" is shunned because of homophony with the word for "death". The attacker will first try passwords where the digit is a "1", allowing him to benefit from the non-uniformity of the user choices.

If the choice of digit is not made by a human brain but by an impartial device, then we get 3.32 bits of entropy, not 3 bits. But that's close enough for illustration purposes (I quite understand that Randall Munroe did not want to draw partial boxes).

Four bits for punctuation are a bit understated; there are 32 punctuation signs in ASCII, all relatively easy to type on a common keyboard. This would mean 5 bits, not 4. There again, if the sign is chosen by a human, then some signs will be more common than others, because humans rarely think of '#' or '|' as "punctuation".

The grand total of 28 bits is then about right, although it depends on the precise details of some random selections, and the list of "traditional substitutions" (which impacts the average number of eligible letters). With a computer-generated password, we may hope for about 30 bits. That's still low with regards to the 44 bits of the "correct horse" method.

Applicability

The paragraphs above show that the maths in the comic are correct (at least with the precision that can be expected in these conditions -- that's a webcomic, not a research article). It still requires the following conditions:

The "password generation method" is known by the attacker. This is the part which @Jeff does not believe. But it makes sense. In big organizations, security officers publish such guidelines for password generation. Even when they don't, people have Google and colleagues, and will tend to use one of about a dozen or so sets of rules. The comic includes provisions for that: "You can add a few more bits to account for the fact that this is only one of a few common formats".

Bottom-line: even if you keep your method "secret", it won't be that secret because you will more or less consciously follow a "classic" method, and there are not that many of those.
Random choices are random and uniform. This is hard to achieve with human users. You must convince them to use a device for good randomness (a coin, not a brain), and to accept the result. This is the gist of my original answer (reproduced below). If the users alter the choices, if only by generating another password if the one they got "does not please them", then they depart from random uniformity, and the entropy can only be lowered (maximum entropy is achieved with uniform randomness; you cannot get better, but you can get much worse).

The right answer is of course that of @AviD. The maths in the comic are correct, but the important point is that good passwords must be both hard to guess and easy to remember. The main message of the comic is to show that common "password generation rules" fail at both points: they make hard to remember passwords, which are nonetheless not that hard to guess.

It also illustrates the failure of human minds at evaluating security. "Tr0ub4dor&3" looks more randomish than "correcthorsebatterystaple"; and the same minds will give good points to the latter only because of the wrong reason, i.e. the widespread (but misguided) belief that password length makes strength. It does not. A password is not strong because it is long; it is strong because it includes a lot of randomness (all the entropy bits we have been discussing all along). Extra length just allows for more strength, by giving more room for randomness; in particular, by allowing "gentle" randomness that is easy to remember, like the electric horse thing. On the other hand, a very short password is necessarily weak, because there is only so much entropy you can fit in 5 characters.

Note that "hard to guess" and "easy to remember" do not cover all that is to say about password generation; there is also "easy to use", which usually means "easy to type". Long passwords are a problem on smartphones, but passwords with digits and punctuation signs and mixed casing are arguably even worse.

Original answer:

The comic assumes that the selection of a random "common" word yields an entropy of about 11 bits -- which means that there are about 2000 common words. This is a plausible count. The trick, of course, is to have a really random selection. For instance, the following activities:

select four words randomly, then remember them in the order which makes most sense;
if the four words look too hard to remember, scrap them and select four others;
replace one of the words with the name of a footballer (the attacker will never guess that !);

... all reduce the entropy. It is not easy to get your users to actually use true randomness and accept the result.

The same users will probably complain about the hassle of typing a long password (if the typing involves a smartphone, I must say that I quite understand them). An unhappy user is never a good thing, because he will begin to look for countermeasures which will make his life easier, such as keeping the password in a file and "typing" it with a copy&paste. Users can often be surprisingly creative that way. Therefore long passwords have a tendency to backfire, security-wise.

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/30766/discussion-on-answer-by-thomas-pornin-xkcd-936-short-complex-password-or-long). — Rory Alsop, Oct 27 '15 at 08:20
One crucial thing I believe this answer is missing is a mention of dictionary attacks. It may be obvious to anyone versed in this stuff that dictionary attacks are considered in the entropy calculation, but every single time I've heard someone criticise this xkcd strip, it's on the grounds that the author only considers brute force attacks and that a real attacker can use a more sophisticated dictionary attack. The thinking there is wrong, but I think dictionary attacks need a mention just to nay-say the naysayers and clear up the confusion here. — Score_Under, Apr 10 '16 at 01:47
An error here : A string of 4 Randomly chosen words out of your dictionary...is also a string of characters chosen out of the 26 letter alphabet.... and its entropy has to be seen in that regard too. Hence the total number of letters in the string have to be such that that they do not reduce the entropy as seen from a 'Word' POV . — ARi, Oct 13 '16 at 07:00
@Score_Under On the contrary, this answer assumes a dictionary attack, as does the comic. — Luke Sawczak, Jun 08 '17 at 18:03
@ARi Using the brute force method you mention, and assuming you use at least four 5-letter words, the second method would have at least [112.4 bits](https://www.wolframalpha.com/input/?i=log2((5*4)%5E26)) of entropy. (Using a dictionary attack reduces it to 44 bits, as shown.) Of course, this all assumes the attacker knows the length of the password, making it a little optimistic. — jpaugh, Oct 10 '17 at 14:50
@Score_Under See above for the brute-force method. (I discovered it by accident.) — jpaugh, Oct 10 '17 at 14:52
@ARi are you sure it's 'a string of characters chosen out of the alphabet' such that that would increase its entropy? isn't it specifically NOT that? I know what you mean: it's a string of characters, but it's not a string of *random* characters. The strength is not 26^25. — Dave Cousineau, Oct 18 '17 at 16:13
"there is only so much entropy you can fit in 5 characters." - Unicode defines >100k characters, giving a character a similar entropy to an English dictionary word (~17 bits). 5 characters of random Unicode has more entropy (~85 bits) than 6 words of Diceware (~78 bits). Harder to input on some devices, though. — Dave Burt, Aug 30 '18 at 04:42
@LukeSawczak Exactly! That's what Score_Under is saying... Please just use the phrase "dictionary attack" to be even more clear. It's really easy to mistakenly think the calculations are based on character-by-character brute forcing. — Nacht, Aug 31 '18 at 01:30
WRT the smartphone thing, I submit that the 'correct horse' type password is going to be easier to enter than the troubadour one because *you don't have to keep switching between lowercase, uppercase, letters, numbers and symbols every other letter*. And as a bonus, it also vastly reduces the chance of point errors, because they are words you know how to spell, so your muscle memory will correctly spot errors you make even when you can't see what you're typing. — Shadur, Aug 31 '18 at 07:51
Isn't 'troubador' spelt with an 'our'? therefore is 'ou' to 'o' a substitution? if not, then 1 letter has been deleted and not accounted for. — philcolbourn, Nov 17 '18 at 05:11
@philcolbourn Congratulations, you've found *another* way that the first one is harder to remember than to guess. :) — Shadur, Aug 26 '21 at 10:58
"Okay, I started with 'troubador' but did I use the american ('troubador') or the english ('troubadour') spelling?" — Shadur, Aug 27 '21 at 23:43
A bit late here, but I wanted to ask about something you mention, that is refusing a password is bad for entropy because maximal entropy is achieved by taking a purely random password. This makes sense to me. However, what would you do if your purely random password was '12345'? It seems to me that refusing such passwords would lower entropy but increase security. Is this because the entropy calculation assumes an attacker trying out passwords at random, too? Therefore we can increase our 'modified entropy' by rejecting passwords an attacker tries with high probability? — Ant, Nov 15 '21 at 23:16

Jeff Atwood · Answer 3 · 2011-08-11T00:11:44.320

The two passwords, based on rumkin.com's password strength checker:

Tr0ub4dor&3

Length: 11

Strength: Reasonable - This password is fairly secure cryptographically and skilled hackers may need some good computing power to crack it. (Depends greatly on implementation!)

Entropy: 51.8 bits

Charset Size: 72 characters

and

correct horse battery staple

Length: 28

Strength: Strong - This password is typically good enough to safely guard sensitive information like financial records.

Entropy: 104.2 bits

Charset Size: 27 characters

It is certainly true that length, all other things being equal, tends to make for very strong passwords -- and we're seeing that confirmed here.

Even if the individual characters are all limited to [a-z], the exponent implied in "we added another lowercase character, so multiply by 26 again" tends to dominate the results.

In other words, 72¹¹ < 27²⁸.

Now, what is not clearly addressed:

Will these passwords have to be entered manually? And if so, how difficult is it, mechanically, to enter a each character of the password? On a keyboard it's easy, but on a smartphone or console... not so much.
How easy are these passwords to remember?
How sophisticated are the password attacks? In other words, will they actually attempt common schemes like "dictionary words separated by spaces", or "a complete sentence with punctuation", or "leet-speak numb3r substitution" as implied by xkcd? Crucially, this is how XKCD justifies cutting the entropy of the first password in half!

Point 3 is almost unanswerable and I think personally highly unlikely in practice. I expect it will be braindead brute force all the way to get the low-hanging fruit, and the rest ignored. If there isn't any low-hanging password fruit (and oh, there always is), they'll just move on to the next potential victim service.

Therefore I say the cartoon is materially accurate in terms of its math, but the godlike predictive password attacks it implies are largely a myth. Which means, IMHO, that these specific two passwords are kind of a wash in practice and would offer similar-ish levels of protection.

@Jeff, this answer is flawed. You rely upon rumpkin for password entropy estimation, but rumpkin's estimates are apparently bogus. Look at the xkcd comic again: it visually depicts the justification for its entropy estimate (that's what the little boxes are doing). xkcd's entropy estimates look about right to me, and your entropy estimates look wrong (overly optimistic). I totally disagree with your conclusion, and I don't see where you get it from. — D.W., Aug 11 '11 at 02:46
Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/30767/discussion-on-answer-by-jeff-atwood-xkcd-936-short-complex-password-or-long-d). — Rory Alsop, Oct 27 '15 at 08:22

score 75 · Answer 4 · edited Mar 17 '17 at 13:14

75

I agree that length is often preferable to complexity. But I think the controversy is less around that, and more around how much entropy you want to have. The comic says that a "plausible attack" is 1000 guesses/second:

"Plausible attack on a weak remote web service. Yes, cracking a stolen hash is faster, but it's not what the average user should worry about"

But I see more of a consensus that web site operators can't keep their hash databases secure over time against attackers, so we should engineer the passwords and hash algorithms to withstand stealing the hashes for offline attack. And an offline attack can be massive, as described at How to securely hash passwords?

This makes the problem even harder, and sites should really be looking at options besides requiring users to memorize their own passwords for each web site, e.g. via OpenID and OAuth. That way the user can get one good authentication method (perhaps even involving a hardware token) and leverage it for web access.

But good password hashing can also be done via good algorithms, a bit more length, and bookmarklet tools. Use the techniques described at the above question on the server (i.e. bcrypt, scrypt or PBKDF2), and the advice at Is there a method of generating site-specific passwords which can be executed in my own head? on the use of SuperGenPass (or SuperChromePass) on the user/client end.

edited Mar 17 '17 at 13:14

Community

1

answered Aug 10 '11 at 23:28

nealmcb

20,544
6
69
116

4

interesting, so this means from the user's perspective, the choice of password is almost -- except for "Password1" type brain damage -- irrelevant! – Jeff Atwood Aug 11 '11 at 00:02
5

Right, it depends on what the threat is. I may just start writing risk = threat x vulnerability x exposure on every question. And for Thomas those are multidimensional cross product operations, and I am assuming a right handed coordinate system in Euclidean space. – this.josh Aug 11 '11 at 05:11
1

@jeff Not at all. The master password used for supergenpass should still be a good one, and you should use their "stealth" password also, and more than 10 character passwords via their "custom" option, in your own bit of javascript. Because as usual you assume the the attacker may well try supergenpass-based passwords, e.g. perhaps because they know that you use it. But the user only has to memorize one good password. – nealmcb Aug 11 '11 at 05:32
2

You don't need strong password for website. Just force `2^attempts` miliseconds delay between two consecutive login attempts, or block the password after 10 wrong trials. 10,000 different passwords are enough for a decent protection if we only need to protect a website. – Elazar Leibovich Aug 11 '11 at 06:13
1

@Elazar Leibovich: The attack described (offline) is run against your compromised database; it won't be restricted by your delays. Also, you do introduce a DDOS method. – MSalters Aug 11 '11 at 12:49
@MSalter, if you fear that your database will be compromised once in a while, force users to change their password every month. If they use a very short password they won't mind so much. bcrypt can also help to protect even low entropy passwords. BTW, if your DB leaks too frequent, reversing the hashes is the least of your users worries... – Elazar Leibovich Aug 11 '11 at 21:21
5

@Elazar I've had access to all kinds of databases in my various jobs. In most of them, nobody would notice if I dumped a copy to a file and walked out with it, reversed those short, irrelevant passwords that users have handily re-used on other sites, then went from there. Database compromise by an external actor isn't the only reason for hashing passwords properly, or for users choosing strong passwords. – Aug 11 '11 at 21:58
@Ninefingers, but if no nobody would notice you dumping the users table, nobody would notice that you changed the user hash for a few hours, or dump his personal information. So as long as the mole is working there - your data is compromised. When you'll stop working there - the user will change his password. And as mentioned, reversing a good hash scheme is non-trivial. Regardless of weak passwords. – Elazar Leibovich Aug 12 '11 at 03:26
1

@elazar You're ignoring the inconvenience to the user from frequent password chagnes and the damage done during the 1/2 month that the attacker has access to the user's account, to say nothing of the longer time that the attacker has access to other accounts where the user reused the password. Avoiding passwords or reducing the number of unique ones a user has to deal with can help in many ways, including allowing them to concentrate on one or a few really good ones. – nealmcb Aug 12 '11 at 05:45
3

`` @nealmcb @elazar it isn't just a convenience thing. If you allow weak passwords or strong ones, people use passwords in patterns, like `youtotallywouldnotguessthis01` then `youtotallywouldnotguessthis02`. Also, if additional services don't make the same restriction, they're then affected. Good hashing is therefore critical - plan is if the database is already compromised. Although you're right that if you're actually experiencing frequent compromise you've probably got bigger issues. – Aug 12 '11 at 08:34
@elazar - as per nealmcb's point. Have a look at http://security.stackexchange.com/q/4704/665 which discusses the pros and cons of short/long password expiry times. – Rory Alsop Aug 12 '11 at 08:53
We will still have to put up with braindamage in form of NTLM hashes for foreseeable future. Which are trivially sniffable in LAN environment. All in the name of backwards compatibility! – Hubert Kario Aug 22 '12 at 20:59

score 69 · Answer 5 · answered Aug 11 '11 at 16:32

69

I think most of the answers here are missing the point. The final frame is talking about ease of memorization. correct horse battery staple (typed from memory!!) eliminates the fundamental danger of password security -- The Post-IT note.

Using the first password, I've got a Post-IT note in my wallet (if I'm smart) or in my desk drawer (if I'm dumb) which is a huge security risk.

Lets assume that the pass phrase option is only as secure as the munged base word option, then I'm already better off because I've eliminated the human failing in password storage.

Even if I wrote down the pass phrase, it wouldn't look like a password. It might be a shopping list - Bread Milk Eggs Syrup. But 5t4ck3xCh4ng3 is very obviously a password. If I came across that, It would be the first thing I would try.

answered Aug 11 '11 at 16:32

Chris Cudmore

790
4
5

5

I have trouble with the odd order of the words. The funny thing is, I think the FINAL frame image, with "Horse: That's a battery staple. Me: Correct!" is even simpler and easier to remember since it's how the actual sentence would work. And much stronger.. – Jeff Atwood Aug 11 '11 at 16:43
5

Maybe we should just introduce graphical passwords, where you have to draw the horse and battery staple in the final frame image. ;-) – TrojanName Aug 12 '11 at 11:23
2

Absolutely, as I said in my answer too - password security is not *just* about entropy, it's about the human aspect, and how the user remembers it (or doesn't). Entropy is absolutely important, but that's not the end of the story. – AviD Aug 13 '11 at 21:11
14

Having a post-it with a very complex password sure beats having a bad password memorized. Usually you're protecting yourself from remote attacks, not someone sneaking around on your desk (that is an issue for office security). Also, with post-its you can easily disguise the password or alter it slightly ("every 1 is a 2", or the password is only half of what is written etc.) to make it useless for anyone else. You're oversimplifying. (Also, just realized how old this is, sorry, but I still think it applies) – pzkpfw Mar 01 '13 at 07:09
2

6 years later from memory- "Correct Horse Battery Staple". (Scroll up and check - YES!) – Chris Cudmore Aug 04 '17 at 12:55
@pzkpfw You're assuming that a remote attacker is more likely. However, what are the chances that a remote attacker is more interested in your password than (say) a resentful co-worker? – jpaugh Oct 10 '17 at 14:59
2

@jpaugh the point is that when it comes to security, it's a well known axiom that "physical access = game over". Protecting yourself against your colleagues entails a completely different set of procedures (physical security, safes, locks etc.) -- for the sake of simplicity discussions about passwords should focus on remote attackers because that's normally what passwords are designed to protect against. That's not saying remote attacks are the only attacks that exist, just that it's a very broad discussion. – pzkpfw Nov 02 '17 at 09:04
1

@pzkpfw Using a post-it is a fundamental shift from `something you know` to `something you have`. It breaks 2FA. – v.oddou Mar 10 '18 at 18:18

score 55 · Answer 6 · edited Mar 17 '17 at 10:46

To add to Avid's excellent answer, the other key messages of the comic are:

the appropriate way to calculate the entropy of a password generation algorithm is to calculate the entropy of its inputs, not to calculate the apparent entropy of its outputs (as rumkin.com, grc.com etc. do)
minor algorithm variations such as "1337-5p34k" substitutions and "pre/append punct & digits" add less entropy than most users (or sysadmins) think
(more subtly) passphrase entropy depends on wordlist size (and number of words), not number of characters, but can still provide easily sufficient entropy to protect against "generation algorithm aware" brute force attacks

To those messages we might wish to add:

as a user you can't generally control whether the web site operator uses salting, bcrypt/scrypt/PBKDF2, keeps their password hashes safe, or even whether they hash passwords in the first place -- so you should probably choose passwords that matter on the basis that they don't (e.g. assume 10^9 guesses per second when sizing passwords/phrases, don't reuse passwords and don't use simple "append the site name" techniques) - which probably makes using LastPass/KeePass/hashpass inevitable
long complicated words don't add much to the entropy unless you use more than a couple of them (there are only ~500K words in English, which is only 19 bits -- just 8 bits more than a word from Randall's 2048-word list)
the "random words" need to be really random for this to work -- picking song lyrics/movie quotes/bible verses gives much lower entropy (e.g. even with perfectly random choice, there are only 700K words in the bible, so there are only ~4M 5-10 word bible phrases, which is only 22 bits of entropy)

Have you considered that from the Bible, there are potentially dozens of different translations? King James, NIV, New American Standard, Strong's Concordance, the list goes on. And that's just English. Suppose some American knew a number of verses in Klingon (with appropriate accents, if appropriate). The flip side of this coin is also an interesting concept. Password cracking libraries may become the repositories (out of order) of all knowledge, because college term papers, news broadcasts, and everything else will get included. — killermist, Apr 14 '15 at 01:25
@killermist. [That already exists.](https://libraryofbabel.info/bookmark.cgi?vodqkovajzfg,.p.oxnd182) (Search for Klingon at the link) — jpaugh, Oct 10 '17 at 15:06

dr jimbob · Answer 7 · 2011-08-11T19:43:21.530

I love xkcd and agree with his basic point -- passphrases are great for adding entropy, but think he low balled the entropy on the first password.

Let's go through it:

Random dictionary word. xkcd: 16 bits, Me: 16 bits. A random word from a dictionary with ~65000 words is lg(65000) ~ 16. Very reasonable
Adding in capitalization. xkcd: 1 bit, Me: 0 bits (deal with in common subsitutions). 1 bits means its there or not there -- which seems very low for the complexity added by adding capitalization to a password -- generally when capitalization to a password its in a random place or I can think of other many possible capitalization schemes (capitalize everything, capitalize the last letter). I'm going to group this with common substitutions.
Common substitutions. xkcd: 3 bits, Me: 13 bits. Only 8 choices for leet speak substitutions, even when only sometimes used? I can think of on average ~2 ways to leet alter each letter (like a to @,4; b to 8,6; c to (,[, <) and add in the original or capitalizing the letter, then for a 9 letter password, I've increased the entropy by 18 (lg(4**9)=18), if I randomly choose each letter to leet alter or not. This is too high for this password, however. A more reasonable approach would be say I randomly chose say 4 letters to "common substitutions" to (one of two leet options, plus capitalization for three options for each letter being substituted). Then it gets to lg( nCr(9,4) 3*4) ~ 13 bits.
Adding two random characters '&3' to the end. xkcd: 8 bits, Me: 15 bits. I'm considering it as two characters to one of ~4 places (say before the word, one before and one after, both after, or both smack in the middle of the word). I'll also let non-special characters be in these two added letters. So assuming an 88 character dictionary (56 characters+10 digits, plus 32 symbols) you add lg(4 * 80**2) ~ 15 bits of entropy.

So I have the calculation as not being 16+2+2+8=28 bits of entropy, but being 16+13+15=44 bits very similar to his passphrase.

I also don't think 3 days at 1000 guesses/sec is by any means "easy" to guess or the plausible attack mechanism. A remote web service should detect and slow down a user trying more than 10000 guesses in a day period for any specific user.

Much more likely are other attacks (e.g., key loggers on public computers, a malicious admin at a service logging passwords and reusing them, eavesdropping if ssl not used, get access to the server somehow (e.g., SQL injection; break into server room; etc)).

I use a passphrase when its necessary -- e.g., strong encryption (and not 44-bits more like 80-bits -- typically 8 word diceware passphrase plus two or three modifications -- e.g., misspell a word or substitute a word for a non diceware word starting with the same two letters; E.g., if you had "yogurt" come up maybe substitute it for "yomama"). For websites (no money involved or security permissions), I don't care about trivial passwords are typically used.

I do notice that for often used passwords, I'm much much better at typing passwords then I am at typing my passphrases (which get annoying when you have to re-key in a ~50 character sentence a few times). Also for passwords, I often prefer finding a random sentence (like a random song lyric -- to a song no one would associate with you that's not particularly meaningful) and come up with a password based on the words (like sometimes use first letter; last letter; or substitute a word for a symbol; etc). E.g. L^#g&B9y3r from "Load up on guns and bring your" from Smells Like Teen Spirit.

TL;DR: Randall is right, if you (a) assume you can check passwords at 1000/s for three days without getting slowed down, and (b) can make many assumptions about the password: constructed based on a rare dictionary word, that is capitalized, has some leet substitutions for some commonly substituted letters, and has a symbol and number added at the end. If you only slightly generalize the allowed substitutions (like I did) and characters added at the end, you get a similar entropy to the passphrase.

In summary, both are probably secure for most purposes with a low threat level. You are much more vulnerable by other exploits, something Randall would likely agree with [538] [792]. In general having password requirements like having a upper/lower/symbol/number is good, as long as long high-entropy passphrases are also allowed. Force additional complexity for shorter passwords, but allow over ~20 characters to be all lower case. Some users will choose poor passphrases just as they choose poor passwords (like "this is fun" which is idiotically claimed to be ridiculously secure here or using their child's name or their favorite sports team). Requiring special characters may make it non-trivial to easily guess (say by a factor of 100-1000 -- changing a password from being 10 likely guesses to 10000 is very significant). Sure it won't prevent any bot on a weak web service that allows thousands of bad login attempts per second, but it forces an addition of a modest level of security which would hinder efforts at sites that limit bad logins or from the unsophisticated manually guessing the password. Sort of like how a standard 5-pin house lock is fundamentally insecure as anyone can learn to pick it in minutes (or break the glass window); however in practice locking your door is good as it provides some safety against the unsophisticated who don't have tools handy (and breaking windows comes with its own dangers of alerting others).

"A remote web service should detect and slow down a user trying more than 10000 guesses in a day period for any specific user." <-- Note that the strip says "insecure web service". Of course a secure webservice does slow things down. Not all webservices are secure though. — Billy ONeal, Aug 11 '11 at 05:35
the comic is saying that all these thing which could increase entropy are done stupidly by users (only capitalise first letter; use a word and only add random number and symbol to the end; etc). — DanBeale, Aug 11 '11 at 06:31
@DanBeale: If you don't trust the users to make a password in a suitably random way, how do you trust them to make a passphrase in a suitably random way? "this is fun" or "let me in" or "fluffy is puffy". — dr jimbob, Aug 11 '11 at 13:44
@Billy ONeal: I read the insecure web service part, but think its irrelevant. Most services worth hacking into (banks, major email accts, major seller (amazon), etc.) should be doing basic login throttling. http://stackoverflow.com/questions/549/the-definitive-guide-to-forms-based-website-authentication/477578#477578 Insecure web services won't; but then again you probably shouldn't be giving any information to insecure web services anyhow -- they may be storing your password in plaintext on computer with known well-known exploits that would take someone much less than 3 days to crack. — dr jimbob, Aug 11 '11 at 13:55
@Billy ONeal: Unless you have done something to make you a specific target, no one's going to spend 300 million attempts over the internet to get your passphrase for your login. After a few minutes of attempts they'll move on to the next username trying to get low-hanging fruit. (Unless of course this is offline hacking; in which case 28-bit and 44-bit are both insecure to any real attempts.) — dr jimbob, Aug 11 '11 at 14:04
@dr jimbob: I don't. We see on this very page a user saying they generate a diceware passphrase, but that they roll a few times to get an easier to remember passphrase. So, i: generate the phrases and print them on $50 bills ii: research using real people seeing how many times they roll to get a diceware pass they'll use, and how small the "real" dictionary is would be a good thing. — DanBeale, Aug 11 '11 at 17:12
@DanBeale I think the point is that (some) users will always do the minimum required of them, and that services which force users to use a "strong" password with at least one number, one capital letter, one punctuation mark, etc are actually counterproductive (though they look really good to management!) — RoundTower, Aug 11 '11 at 17:53
@RoundTower - welcome to SecuritySE! I agree; I said the same in some other comment above. Policies to force "strong" passwords come close to security theatre in some cases. — DanBeale, Aug 11 '11 at 18:03

score 39 · Answer 8 · edited Mar 29 '16 at 08:08

39

Looking at the XKCD comic, and at examples of real world passwords, we see that most users have passwords much much weaker than the XKCD example.

A bunch of users will do exactly as the first panel says - they'll take a dictionary word, capitalize the first letter, do some gentle substituting, then add a number and symbol to the end. That's quite bad, especially if they re-use that password (because they think it's strong) or if their account has privs.

As has been mentioned in comments, Diceware is a nice way to generate a passphrase. I'd like to see the easy read version of a formal analysis of Diceware. (I suspect that even if the attacker has your dictionary and knows that your passphrase is 5 or 6 words long that Diceware is better than a bunch of other password generation systems.

But, whatever password they have, many users can be persuaded to change it to a known value:

http://passwordresearch.com/stories/story72.html

During a computer security assessment, auditors were able to convince 35 IRS managers and employees to provide them with their username and change their password to a known value. Auditors posed as IRS information technology personnel attempting to correct a network problem.

edited Mar 29 '16 at 08:08

Benoit Esnard

13,942
7
65
65

answered Aug 11 '11 at 10:12

DanBeale

2,064
3
18
27

8

good old social engineering... the forever hack. http://www.codinghorror.com/blog/2007/05/phishing-the-forever-hack.html – Jeff Atwood Aug 11 '11 at 16:41
Entropy basically measures the number of possible passwords allowed by the scheme you used to generate your password. You do this by taking the base-2 logarithm of the number of passwords allowed by your scheme. If your scheme allows ~2^10=1024 passwords it has 10-bits of entropy; every extra bit doubles the number of passwords allowed and doubles the time to brute-force guess. Ok, so if your diceware passphrase is generated by rolling 5 six-sided dice (the dictionary has 6^5=7776 words), then each word adds 13 bits of entropy (log(6^5)/log(2)=12.9). Hence 4 words would be 52-bits. – dr jimbob Aug 12 '11 at 00:40
1

And for more questions: http://world.std.com/~reinhold/dicewarefaq.html – dr jimbob Aug 12 '11 at 00:42
2

Not sure why you want a "formal" analysis of Diceware. The concept is simple and it works. There's no fancy math needed. Its dictionary has almost 8000 (= 2^13) words, so a random four-word passphrase has about 52 bits of entropy (i.e., there are about 2^52 possible such passphrases, all equally likely; i.e., an attacker would need to try about 2^52 guesses to find your passphrase). – D.W. Aug 12 '11 at 05:23
2

@D.W. Diceware is good, but users are poor. There's a comment on this page from someone who says that they roll more than once to get a memorable passphrase. So, if the phrases given to the users it's okay, but what happens if users are allowed to use diceware themselves? How many re-roll? Which diceware dictioary words are least / most popular? – DanBeale Aug 12 '11 at 06:46
2

A useful analysis would be to see how many of those 2^52 passphrases always get ditched by users as not easy to remember. That may significantly reduce the space. – Rory Alsop Aug 12 '11 at 08:57
8

@DanBeale, re-rolling a couple of times doesn't significantly harm the entropy. (If you re-roll up to 4 times, the decrease in entropy is at most 2 bits, from 52 bits to 50 bits. Not a big deal.) On the whole, the practice of re-rolling to find a memorable passphrase is probably positive, because it makes passphrases more memorable and hence increases the likelihood that users will use / keep using the Diceware scheme. As you say, the greatest challenge is usability; for that, what's needed is a usability study, not a formal mathematical analysis. – D.W. Aug 12 '11 at 19:13
@D.W. - thanks for the info. It's reassuring to know that a re-roll or two is okay. – DanBeale Aug 12 '11 at 19:46
3

@DanBeale: even re-rolling 16 times decreases the entropy by 4 bits, from 52 to 48. You're still few orders of magnitude better than l33t-speak passwords. – Hubert Kario Sep 09 '12 at 15:59

wisty · Answer 9 · 2011-08-11T14:56:52.950

The issue is still, sadly, a human one.

Will pushing users to alphanumeric + punctuation passwords be safer, or longer passwords?

If you tell them to user alpha + numbers, they will write their name + birthday. If you tell them to use also use punctuation, they will replace an "a" with "@", or something similarly predictable.

If you tell them "use four simple words", they will write "i love my mother", "i love your mother", "thank god its friday" or something else banally predictable.

You just can't win. The advantage of 4 word passwords is, they can memorize it, so if you are going to force users to have strong passwords (which you generate) then at least they won't need to write it down on a post-it note, or email themselves the password, or something else stupid.

But what you've missed is that the xkcd comic was probably advocating that a random four-word password be generated for the user by randomly sampling from a 2048-word dictionary, *not* asking the user to pick four arbitrary words. This may not have been obvious if you aren't familiar with the history of the field, but the basic idea has been mooted before by cryptographers. — D.W., Aug 11 '11 at 20:10

score 32 · Answer 10 · answered Aug 11 '11 at 20:02

Randall is mostly correct here. A few additions:

Of course you have to choose the words randomly. The classic method is Diceware, which involves rolling 5d6, giving almost 13 bits of entropy per word, but the words are more obscure.

There may be 2048 common words in English, but there aren't 2048 short common words in English. The Diceware list (which has 6^5 = 7776 words under 6 letters long) has some pretty obscure stuff in it, plus names, plurals, two-letter combinations, two-digit combinations, 19xx, etc, and I don't think the top quarter of that would be much better. If you just take the top 2k word in English, you get stuff like "multiplication" which is a bit long as part of a 4-word password. So I'd be interested to see Randall's suggested word list.

There are more obvious variations of "Tr0ub4dor&3"-type passwords than of Diceware ones, so in practice the former will have a couple more bits of entropy. Also, in my experience, "Tr0ub4dor&3" type passwords are not actually that hard to remember if you use them often. In the past, I've generated several passwords with strings /dev/urandom, and had no trouble using them as login passwords. Today, though, I couldn't tell you which of the letters were capitalized. On the other hand, I'm not sure I could recite some of my Diceware passwords without confusing homophones, pluralization, etc.

If the password database is stolen, a strengthener like PBKDF2 would add a word or more to to the effective length of one of these passwords, but many sites don't use it. 5 words + one for the strengthener would yield some 66 bits, which is probably too big for a rainbow table. This puts you well out of range of casual attackers, so unless you have something really important on your account you should be fine.

In sum, Diceware-type passwords are ideal for things you type occasionally, but not necessarily every day. If you use a password every day, then Diceware would work, but a strings /dev/urandom password will be shorter and you should be able to memorize it anyway. If you log in rarely, then choose a password in any way you like, toss it into your password manager (which you should use for the more commonly-used passwords too), and forget it.

If a site has some odd restriction, like "no spaces, at least one upper-case letter and at least one number", then string the words together with 5s between them and cap the first letter. This loses epsilon entropy for Diceware and none for Randall's scheme.

I think you overestimate the ability for the average person to memorize random data. — Billy ONeal, Aug 12 '11 at 18:13
For your password database, you should use a salt anyway, so rainbow tables are of no use. — Paŭlo Ebermann, Jan 27 '13 at 18:13
Example output of `strings -n 10 /dev/urandom` has stuff like `t!AF|r)WlB`, `<^p!*P,gvv` and `-WAWkG;]%>(`. Are you *really* claiming that you can *remember* any of those? The `correct horse battery staple beer` sounds like a winner to me and still has roughly the same entropy. — Mikko Rantalainen, Mar 08 '13 at 10:34
The [Mnemonic word list](https://github.com/jasimmonsv/Mnemonic-Word-List) has 1626 words (entropy: 10.66 bits/word) with average length of 5.76 chars (4-7 chars/word). All words are real, internationally recognizable and phonetically different. As a non-native English speaker I find them quite easy to understand. About Diceware: I use Diceware method with Finnish word list to generate passphrases that I type many times every day at my work (the length is not a problem for me). The Finnish word list is much better than the original because all words are real and there is much less oddities. — mgronber, Jun 18 '14 at 06:51

score 30 · Answer 11 · edited Aug 01 '15 at 01:08

30

The Openwall Linux pwqgen tool generates passwords with a specific amount of entropy along these very lines.

However, instead of using spaces to separate words, punctuation characters and digits are used instead. Here's ten examples with 49 bits of entropy:

Cruise!locus!frame
tehran!Commit6church
Seller7Fire3sing
Salt&Render4export
Forget7Driver=Tried
Great5Noun+Khaki
hale8Clung&dose
Ego$Clinch$Gulf
blaze5vodka5Both
utmost=wake7spark

The words are harder to read without their spaces, but in the years that I've been using passwords generated with this tool, I haven't found the punctuation to be difficult to remember.

edited Aug 01 '15 at 01:08

Simon

3,182
4
26
38

answered Aug 12 '11 at 07:51

sarnold

721
4
7

2

Thanks for the pointer! But I think you have a typo there. I'm guessing you're providing a list with 44 bits of entropy, not 64. It starts generating 4-word phrases at 48 bits. And for the record, via `for i in {26..81}; do echo -n $i " "; pwqgen random=$i; done`: At 26 bits it has 2 word phrases, at 27 it adds punctuation separators, at 31 it has 3 word phrases, with punctuation at 40, at 48 it has 4 word phrases, with punctuation at 53, at 65 it moves to 5 word phrases, with punctuation at 66, all the way up to 81 bits. Seems a bit odd there. – nealmcb Dec 19 '14 at 05:53
4

The [passwdqc source code](http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/passwdqc/passwdqc/) reveals a **tiny 4096 word dictionary**, Meaning the true bit entropy of these "64 bit" examples is **actually 49.8 bits**: `log₂(⟨4096×2⟩×42×⟨4096×2⟩×42×⟨4096×2⟩)` (2x for random-cased 1st letter, 42 printable non-letter chars on a en_US keyboard). [Diceware](http://world.std.com/~reinhold/diceware.html) is 7776 words and a standard dictionary is 100k words. passwdqc would need 155k words for 64 bits of entropy. See also my [pw entropy table](https://security.stackexchange.com/a/93628/42391). – Adam Katz Jul 27 '15 at 19:04
1

@AdamKatz, yikes. I've always assumed it used `/usr/share/dict/words` on my system, which is ~100k words, which would have lead to ~60 bits of entropy for this format. I'll amend my answer. Thanks. – sarnold Aug 01 '15 at 00:48

nullnvoid · Answer 12 · 2011-08-11T01:10:21.327

25

I'm not a security expert, but I think there's a mistake here.

The point of the comic seems to indicate that by increasing the length of the password, you will increase the complexity.

The complexity of a four word password is actually significantly reduced. Essentially you've swapped 8-11 semi randomly selected characters for 4 randomly selected words. The length of the pass-phrase is less important than the number of words. All that is required is that the attacker attempts various combinations of words from a dictionary. Words used are likely to be fairly common if they are to be remembered and are likely to be short if they must be typed.

Dictionary attacks are already used to great effect. We don't commonly use 260,000 words. We only know around 12,000 to 20,000 words. Of these we commonly use less than 8,000. I can't imagine regular users are going to create passwords which contain long and complicated words to type which further reduces the subset of possible words. Assuming that everyone were to start using this method of password selection, then dictionary attacks would actually become far more effective.

Now, a combination of the two may be more secure, although this defies the point of the exercise.

edited Aug 11 '11 at 01:10

answered Aug 11 '11 at 00:10

nullnvoid

439
3
4

6

Dictionary attacks are already used to great effect. The point I was attempting to make was that length isn't the measure of complexity in this system. Anyway, we don't commonly use 260,000 words. We only know around 12,000 to 20,000 words. Of these we commonly use less than 8,000. I can't imagine regular users are going to create passwords which contain long and complicated words to type which further reduces the subset of used words. – nullnvoid Aug 11 '11 at 00:28
13

11 bits of entropy are assumed for a dictionary word. That translates to 2000 words, much less than 8K that you mentioned previously. So, the argument would still hold. – PearsonArtPhoto Aug 11 '11 at 00:45
28

This answer ignores the numbers used in XKCD analysis and thus is completely unfounded! – Rotsor Aug 11 '11 at 01:36
11

*"the comic seems to indicate that by increasing the length of the password, you will increase the complexity"* - No, that's not what the comic is saying at all. Read it again, more carefully this time. – D.W. Aug 11 '11 at 02:50
20

I think you're confusing between `complexity` and `entropy`. The point of the comic is that the passphrase *lowers* the complexity, while offering higher *entropy*. Complexity affects humans, not computers - while entropy is what prevents the bruteforce. – AviD Aug 11 '11 at 08:10
1

I agree completely with this answer. If it were common practice to longer passwords made up of longer words, then dictionary attack methods could be easily optimized for this. – user606723 Aug 11 '11 at 14:49
7

You have a choice between 96^NumOfChars or of 7776^NumOfWords. (7776 is Diceware's dictionary size.) 96^11 == good but hard to remember. 7776^6 == better and easier to remember. – DanBeale Aug 11 '11 at 20:05
1

I'm certainly no expert in this area, and have not encountered Entropy before. I note that a number of people are saying "I think that 11 bits of Entropy per word sounds about right" but I've not noticed anyone expanding on how this was calculated. Does anyone have a good introductory article to calculating entropy? I've looked at a few that were aimed at maths students. – nullnvoid Aug 11 '11 at 23:38
7

nullnvoid, A 2048-word dictionary has, well, 2048 = 2^11 words. If you pick one uniformly at random, then there are 2^11 possibilities. Thus, 11 bits of entropy. An attacker would need to try about 2^11 guesses to find your word (2^10 on average, 2^11 in the worst case). You can find more details on entropy calculations in other answers and comments on this page. – D.W. Aug 12 '11 at 05:26
@DanBeale Why is `7776 ^ 6` "better"? Isn't `96 ^ 11 - 7776 ^ 6 = -214691526415214947860480 > 0`? – kizzx2 Mar 13 '12 at 17:29
5

@kizzx2 - yes. 96^11 is smaller than 7776^6. Thus, a random password of 11 letters, chosen from 96 different characters, will be hard to use but quite good; while a random passphrase of 6 words (chosen from a list of 7776 words) will be easy to use and stronger. – DanBeale Apr 21 '12 at 20:37
1

@nullnvoid Thing is, the subset of the 10,000 words *you* know is different from the set that *I* know. They're a Venn diagram -- and given that we're both frequenting this site, there's probably more overlap than with two randomly picked people -- but they're not the same. In the scenario we're working with here the attacker is assumed not to know anything about the specific target. – Shadur Mar 20 '13 at 09:50
@AviD The other thing the answer poster seemed to miss is that password checkers don't really (as long as they're not FUBAR) provide "too short" or "too long" feedback, and instead just supply, "No." – killermist Apr 14 '15 at 01:50

score 20 · Answer 13 · answered Nov 02 '15 at 13:30

In an empirical test, passphrases don't seem to help as much as XKCD would have you believe: dl.acm.org/citation.cfm?id=2335356.2335366

Users tend to create passwords that are easy to guess, while system-assigned passwords tend to be hard to remember. Passphrases, space-delimited sets of natural language words, have been suggested as both secure and usable for decades. In a 1,476-participant online study, we explored the usability of 3- and 4-word system-assigned passphrases in comparison to system-assigned passwords composed of 5 to 6 random characters, and 8-character system-assigned pronounceable passwords. Contrary to expectations, system-assigned passphrases performed similarly to system-assigned passwords of similar entropy across the usability metrics we examined. Passphrases and passwords were forgotten at similar rates, led to similar levels of user difficulty and annoyance, and were both written down by a majority of participants. However, passphrases took significantly longer for participants to enter, and appear to require error-correction to counteract entry mistakes. Passphrase usability did not seem to increase when we shrunk the dictionary from which words were chosen, reduced the number of words in a passphrase, or allowed users to change the order of words.

[The full, free version](https://www.ece.cmu.edu/~lbauer/papers/2012/soups2012-passphrases.pdf) is available from their site. — Daan Bakker, Nov 07 '15 at 02:11

PearsonArtPhoto · Answer 14 · 2011-08-11T14:56:17.317

20

Reading the comment, I have a few thoughts.

The entropy count on the password is rated quite low. At the very least, you should add in another 1-3 bits for character substitutions. In this example, it seems like the exact formula is known, which seems a bit unlikely.
The entropy count of the longer password seems correct to me.
The base word seems to be "Troubador", if I'm deciphering the common substitutions correctly. I don't think that's a common word, so limiting the entropy of a dictionary based word to only 11 bits of entropy is a bit low. I'm guessing that it would still be in a dictionary, but should be expanded to at least a 15 bit entropy, or a selection of 32K words. That seems to be about the level it would be at using a dictionary based attack.

It is true that in principal, a long password composed of random words is at least as good of a password as a short password with more characters, but the words must be random. If you start quoting a well known phrase, or even anything that could be a sentence or part of one, it severely limits the entropy.

edited Aug 11 '11 at 14:56

answered Aug 10 '11 at 23:06

PearsonArtPhoto

361
1
9

10

The xkcd comic assumes there are about 8 commonly used substitutions that an attacker would need to try (3 bits of entropy). You seem to be arguing that there are about 128-256 commonly used substitutions that an attacker would likely need to try (7-8 bits of entropy). Where's your justification for that? Personally, I think xkcd's estimate is better than yours. – D.W. Aug 11 '11 at 02:52
The justification is you can either use them or not. If you substitute 0 for o sometimes, that's 1 bit of substitution. If you might use * and -, for instance, that's 2 bits of substitution for each time you find such a character, as there's four combinations. I think 1-2 upon reflection is more accurate, giving it 4-5 bits instead of 3. – PearsonArtPhoto Oct 16 '14 at 22:11
2

You are assuming that every password has 8 different letters that each provide an independent opportunity for substitutions. For most passwords, I anticipate that won't be the case. – D.W. Oct 16 '14 at 22:50
1

On point 1: If you assume that an attacker won't be able to guess your password scheme, then you can indeed strengthen your assumed password strength. However, a) attackers are better at guessing schemes than you are at inventing them; b) mandatory password policies make it easier to guess schemes; c) you would also have to question whether an attacker knows the exact length and dictionary of the passphrase, so that gets stronger too. On point 3: Randall actually granted 'Troubadour' 16 bits of entropy for being an uncommon word, compared to just 11 bits for his 'common words'. – ThrawnCA May 30 '17 at 04:12

score 18 · Answer 15 · answered Aug 11 '11 at 20:49

18

A lot of the responses to this question raise the obvious point that even if you ask users to use several words separated by spaces for readabilty, too many users will choose words in "banal" phrases, like "i love my mom", which, if crackers were cognizant that such simple phrases were in common use, would be quickly cracked. I assume that some crackers may already be doing this -- they're not stupid after all! But all is not lost.

Too many websites that I have login relationships with require me to make my password more complex (using leetspeakish and other combination requirements for upper/lowercase, numerics and symbols). They enforce making my passwords hard to remember.

What if instead of requiring complexity, a password validattion check was performed that first checked a recognized password dictionary, and once a dictionary attack was eliminated as a problem, it would calculate the strength against a bruteforce attack, and require only that a password be able to hold off bruteforce attack for a trillion years? In that instance, Tr0ub4dor&3 would not pass -- Steve Gibson's search space calculator at https://www.grc.com/haystack.htm says it could be brute-forced in only 1.83 billion centuries, but "hold wine fine cold" could withstand it for 1.43 billion trillion centuries.

What I am trying to say here is to stop forcing users to come up with bizarre character combinations, and actually vet their passwords against dictionary and brute force attacks. I think security could only improve as a result.

answered Aug 11 '11 at 20:49

Cyberherbalist

331
2
8

Or just force people to use tools like Keepass -- does the job excellently. – Billy ONeal Aug 11 '11 at 22:30
5

Welcome to securitySE! I'm tempted to -1 for mention of GRC. But you have a good point. – DanBeale Aug 12 '11 at 05:22
1

Heh. And what's wrong with GRC, exactly? :-) – Cyberherbalist Aug 12 '11 at 15:56
3

@Cyberherbalist, welcome to [security.se]! Sadly Steve Gibson is recognized as clueless and a charlatan, at least according to [attrition.org](http://attrition.org/errata/charlatan/steve_gibson/)... – AviD Aug 13 '11 at 21:29
@AviD - late getting back to this, but even Mother Teresa has haters. Gibson may not be up to the standards of some people, but his work has actually helped me on a few occasions. And SpinRite saved my a$$ once when it counted. Sadly, eventually everyone is outmoded and obsolete. – Cyberherbalist Aug 14 '15 at 18:13
1

@Cyberherbalist It's not about having haters, it's about being a better marketer than anything else, even if he doesn't understand the words he is repeating out of context, or making up altogether. But here is really not the place for a discussion on his expertise... – AviD Aug 16 '15 at 07:05
@AviD : Granted this isn't the place. I will point out that I didn't start the fire. But enough said already. – Cyberherbalist Aug 18 '15 at 22:37

score 15 · Answer 16 · answered Jun 12 '13 at 04:29

15

Bruce Scheiner loves long passphrases but he also has been pointing out the practical difficulties of a long passphrase for many years. Passwords are not echoed on the screen when you type them. You see an asterisk or a big fat black dot on the screen per letter. Even when typing out 7-8 character passwords you occasionally wonder in the middle if you have typed it right. So you backspace everything and start again. Occasionally we forget where we are in the middle of the password & count number of characters already typed and comparing it mentally with the password to figure out where we are. It would be even more difficult to do this with a long passphrase if it's not being echoed. I think long passphrases will happen only after this problem has been effectively solved.

answered Jun 12 '13 at 04:29

user93353

1,982
3
19
33

6

I've been using "simple" passpharses (as in, just lower case letters, actual words) exclusively for the past few years. While I enter different ones at least 6 times a day, I don't have to backspace the whole passphrase because of a typo more than once a week. I am reasonably sure that this is the case for anyone that regularly writes using keyboard... – Hubert Kario Jun 29 '13 at 10:02
1

I had to change our WiFi password, as it was too complex to type into a iPhone! – Ian Ringrose Feb 04 '14 at 12:43
1

I wonder what the security implications would be of having the password-entry logic look to see if any word is duplicated and, if a password doesn't work, trying the password that would result from striking out any duplicated word and anything between the duplicates? So `correct horse batteru horse battery staple" would be turned into "correct horse battery staple"? – supercat Jun 10 '14 at 16:03

score 13 · Answer 17 · answered Aug 11 '11 at 00:10

13

I agree with Jeff Atwood. Also, I have taken a (not the) English dictionary I have here in MSSQL with 266,166 words in it (and also 160,086 German and 138,946 Dutch words) and taken a random selection of 10 words for each language.

These are the results for English:

clever-handed
wolframinium
muth
unvolcanic
contradictorily
desperadoism
unpreternatural
placability
recondensation
Remi

Now take any combination of any words that will give you enough entropy and you're good to go. But as you might see from this example it's not very easy to make something easy to remember out of this wordjumbo. So entropy goes down a lot when you're trying to create "understandable (not correct or logical) sentences".

For completeness' sake (random) results for German and Dutch:

German:

torlos
Realisierung
hinterhaeltigsten
anbruellend
Orthograpie
vielsilbigen
lebensfaehig
drang
festgeklebtes
Bauernfuehrer

Dutch:

spats
bijstander
Abcoude
vergunningsaanvraag
schade-eis
ammoniakuitstoot
onsentimenteel
ebstand
radiaalband
profielschets

answered Aug 11 '11 at 00:10

RobIII

442
2
9

25

This has no technical argument, and it misunderstands the xkcd proposal. The xkcd proposal is *not* to select 4 words at random from a 200,000-word dictionary. Rather, it is to select 4 words at random from a smaller dictionary of the 2000 most common words. If you follow what xkcd *actually* recommended, you'll find that the resulting passphrases are easier to memorize. – D.W. Aug 11 '11 at 02:50
4

Most of the words suggested for German are not part of a real dictionary anyway. With "real" I mean the kind of dictionary ordinary people think of, not the one used by software for spell checking and password cracking. – Hendrik Brummermann Aug 11 '11 at 08:32
"Colourless green ideas sleep furiously." Chomsky. – TRiG Aug 11 '11 at 10:32
3

@TRiG: That's the combination I have on my luggage! – Piskvor left the building Aug 11 '11 at 14:35
"Orthograpie" should be "Orthographie", actually. (I'm not sure if you mistyped this or if your dictionary already contains this error.) – Paŭlo Ebermann Aug 11 '11 at 20:50
5

So we're agreed: Germans need two words, the rest of us need four words? :) – sarnold Aug 12 '11 at 07:34
@D.W. "_Rather, it is to select 4 words at random from a smaller dictionary of the 2000 most common words._" I don't see that anywhere in the comic, nor in the "tooltip" in the original xkcd comic? Also: how does a smaller ditionary improve entropy (which the comic is about)?? – RobIII Sep 07 '11 at 17:13
@PaŭloEbermann I have no idea; all words come from some random "Dictionary" database I had lying around. My comment is not about perfect spelling though but about not "simply" being able to create easy to remember 'sentences' (regardless of any typos I may have made). – RobIII Sep 07 '11 at 17:16
1

Sorry for my comment, it was just a bit funny that you actually spelled the word for "right spelling" wrong. It does not really relate to the security of your system (though right-spelled passphrases will be easier to remember, I think). – Paŭlo Ebermann Sep 07 '11 at 17:29
10

@user3992, it is a bit subtle. You have to read the xkcd comic closely and be familiar with the security literature and past work in this area (e.g., Diceware). Tips for groking the xkcd comic: 1. One panel says "four random common words". 2. The next panel says "44 bits of entropy", which implies 11 bits of entropy per word -- which in turn suggests that it is proposing you should pick each word randomly from a dictionary of about 2000 common words. (Not a dictionary of 266,000 words; that'd be 72 bits of entropy, rather than 44 as quoted in the comic.) – D.W. Sep 07 '11 at 17:54
@D.W. Ah, that was SO subtle I missed it! Thanks for pointing that out! – RobIII Sep 12 '11 at 23:19
2

Also, @RobIII note that "what the comic is about" is NOT *just* about improving entropy, it is about the trade-off between increased entropy and memorability (usability) - and how the typical solution is a backwards tradeoff. – AviD Dec 15 '13 at 10:20

score 8 · Answer 18 · edited Dec 30 '16 at 05:48

I've wondered about this one as well, and I would like to analyze it not from a philosophical point (if users write down their passwords, it becomes something you have instead of something you know... 2-factor becomes 1-factor), but a mathematical and scientific standpoint.

I recently downloaded a GPU password cracking software to play around with. I'd like to crack both of these passwords using that (since it's my new toy) and determine which is better.

For a hypothesis, I would like to also throw out a possible variation--the attacker may know you only use dictionary words and don't enforce symbols and numbers (decreasing the key space).

Scenario 1

- Attacker knows you only used dictionary words. (Keyspace = 26 letters + 26 capitals + 1 space = 53).
- Password requirement is must have 4 dictionary words with a minimum total characters of 20.

Scenario 2

- Same as Scenario 1, except Attacker doesn't know only dictionary words are used (keyspace increases).

Against 2 control groups where the random passwords contain a 2 numbers, a 2 special characters and is 16 characters long.

Would this be an appropriate test? If not, let me know, and I'll edit the parameters.

Not really an appropriate test; because the first random password the GPU picks may very well be the password. There's a probabilistic nature here that would have to be considered for this to work. — Billy ONeal, Jan 12 '12 at 15:07
I guess a more appropriate "test" would just be to show which is longer via math? — Jeff, Jan 12 '12 at 15:13
I don't understand where you got those length requirements from. They don't look reasonable to me. (The randomness effect can be easily controlled for by repeating the test a few times.) — D.W., Dec 19 '12 at 20:52

score 8 · Answer 19 · edited Mar 17 '17 at 10:46

The XKCD comic does not explicitly depict that passphrases may contain separator symbols between the words. A natural choice is to add the same symbol between all words. If the app has a show password option, the phrase can be red easily. In theory that adds 5 bits of strength, downgraded to 4 bits, see The "troubador" method of explanation of the mathematics in this comic

The exact downgrading also depends on how easy it is for an attacker to guess the symbol, or first try specific separator symbols. That's why I use 3 distinct sets of symbols to calculate this type of strengthening. Set1: 13 symbols from the iPhone number-symbols, 31 symbols for all of them, and a special set of 3, for the often used separator symbols: space - _

I use the following for the calculation (Excel notation, only if a separator symbol is being used):

strength = >> see below for a simplification =log( NumWordsInCurrentDict*((sepaClassSize^(SepratorLen)+1)*(NumWordsInCurrentDict)^(nWordInPhrase-1)) , 2)

4 Diceware words give the following strengths: No separator: 51.7 bits; A space separator: 53.3 bits; and a separator from all 31 symbols like ^ : 56.7 bits

============= edit en edit2: forgot the log 2, fixed

The simplified formula for the above one is: log( ((sepaClassSize+1)^SepratorLen)*(NumWordsInCurrentDict^numWordsInPhrase) , 2 )

The XKCD comic depicts extensive modification/substitution options for passwords. The Diceware Passphrase Home Page mentions a special modification that is not in the comic pass phrase part: insert just 1 random letter in just 1 of the words of the phrase chosen. That would add another 10 bits of entropy.(see the section "Optional stuff you don't really need to know")

score 6 · Answer 20 · edited Sep 11 '18 at 19:23

Seems that most agree that regarding maths, Horse method is superior--to what extent seems to be mostly about limitations like how uniform the choices are, or what are these "easy to remember" or "easy to type" phrases.

Fair enough, but I'll teach you a magic trick on how to make these limitations a "bit" less relevant:

Use Horse method as platform for building new phrases. Use properly random method to choose the words. This may get you some hard words that you never heard and may find hard to remember, but...

That's for following the Horse method blindly. The magic trick is that you don't stop here. Unless you are a desperately boring and un-creative person, you can get a great advantage from the next steps.

Make use of your brain.

I mean, not just think, have your brain fart out completely new words for you, or completely new methods to distort the existing ones.

You are allowed (or even encouraged) to replace some words in the phrase with these.

Also, this rule also applies to everything I write from now on: just go ahead and change the methods arbitrarily ;)
Make use of your other languages.

Most of us know more than one language. Mix them as you see fit.

(Special case of this can be making use of different keyboard layout used in your country. For example, in Czech layout, letters with diacritics share keys with numbers---the row above the alphabetic part. This, in fact, creates a mapping of letters and numbers that can supplement or replace the "traditional" L33T. Think about how you can benefit from it.
Invent your own methods.

...how to further distort the phrase. You can re-use the method for new passwords, it all depends on how complex method you will create---more complex, more re-usable but don't overdo it :)
Make the process fun!

Generating a password does not have to be boring. In fact, funnier you make it, the more likely you are to actually remember the password.

But don't get me wrong: don't make it funny at the cost of uniqueness. Try to use that kind of funny which is funny only to you (ask your brain).

(Oh and don't make it too funny--you don't want to giggle and blush every time you type your passphrase ;))

If done right, every bit of the above will give the Horse method a great advantage.

Human brains are heavily biased against randomness and toward patterns, and we just are dreadful at generating or verifying randomness. So no, getting the human to choose their own arbitrary words will not improve a truly-random passphrase's entropy. — bignose, Dec 28 '15 at 06:08

score -6 · Answer 21 · answered Aug 11 '11 at 00:56

-6

Totally wrong. There's more than math at work here. Human beings creating passwords out of human language != bots creating randomized strings out of buckets of char. If you start with that assumption, entropy is radically reduced as a factor in the time necessary to crack the password. As Don Corleone, the great philosopher, said: Think like the people who are around you.

answered Aug 11 '11 at 00:56

yelvington

17

12

If you read the XKCD comic, you'll notice that it's suggesting you select common dictionary words at random (i.e., from a bot). The entropy is still there, since they're chosen at random from a set of thousands of possibilities. They just happen to be possibilities that humans are good at recognizing. – Zach Aug 11 '11 at 05:20
7

Yes, the XKCD system will be just as weak as other approaches, if you let users pick the password. If you *generate* the password for users (corporate IT does, sometimes), the XKCD approach is great, as the user will be more likely to remember it. – wisty Aug 11 '11 at 14:55

score -8 · Answer 22 · answered Aug 12 '11 at 01:11

As some people already stated (so I'm not going to repeat that), it depends on the mechanism of brute-force attacks and dictionary attacks being used.

First of all, the best way to keep an attacker from attacking is taking away the target in the first place. None of my servers have SSH running on port 22 and root login is most always deactivated in sshd configuration. But that's just an example. Don't give away the user name and you can save yourself a lot of trouble.

That's the simplest of math: Avoid attacks by others by hiding :)

So, for the rest: Those who actually guess the username right and find your service, will try very common brute-force attacks. Short passwords are always a bad idea, because there's no dictionary needed. Cycling through all the alphanumberic combinations in both lower and uppercase and common 'salt' like commas, semicolons and so on would take a few days to crack. Based on my own experience (had an old OpenBSD routing machine setup, but the internet provider password changed and I didn't have physical access to the machine). The password turned out to be [Firstname][Lastname][Number] of some celebrity.

I was curious, so I tried different cracking tools. A name-based one took only six hours to crack the same password. Guess it was cycling through common name/number combinations.

The trick with those brute-force attacks is to know what you're dealing with. A password that is based on something personal, that is encrypted with your own method is still safe from most dictionary attacks and can only be guessed by a simple brute-force attack, which would take years to cycle through all the possibilities.

Give you an example: My name is Andreas, so my password is kinda safe.

MyN4me,A->PwKndSf

According to rumkin, this is kinda safe :) 87.1 bits entropy. Wow. Not bad for a first try. I can actually remember that and most mechanisms will not attempt to 'guess' that kind of a password, because it doesn't make any sense to any of the systems.

Either it's short and complex, like L5q3CR,-F - which is kind of hard to remember but easy to guess, or it consists of variations of actually existing words. It's a human weakness, to help yourself remember things or go for something really simple, or common.

I know, this is a little bit off-topic, but: if you don't want to become a victim of a brute-force attack, lock most of the doors first, or even better: remove/hide the doors :)

don't offer the login mechanisms that crackers expect, if you can.
protect web-services with client authentication
if you're totally paranoid, filter access to your service by IP-address, too
secure and totally paranoid setup: (this is unfair to crackers :)) ) after a failed password attempt, for the following 5 seconds, every following attempt for the same user (even a correct one) will fail, too.

If somebody manages to get around all that, you're dealing with pros anyways :) but keep your password secure by doing something human, that nobody expects and no computer can guess or predict: do your own thing, just remember that your own thing has to be long enough to avoid the simple attacks and stay out of the dictionary for the most part. Use something, that only makes sense in YOUR brain and scatter in a few special characters.

For a cracker, a fast way to guess a password is only offered when you do something predictable, like use something short, that's easy to memorize or something that consists of common words, or combinations of letters that you find in dictionaries.

Stay away from those, and you can even stick with rumkins calculations.

Unfortunately, rumkin is not a reliable judge of password strength. — D.W., Aug 12 '11 at 05:35
I don't hide. My ssh ports are on 22. If anyone fails to login to any account 10 times within 5 minutes, then their ip address is firewalled off with iptables and a message is sent to their ISP warning them of a compromised machine on their network. — Andy Lee Robinson, Aug 12 '11 at 08:39
Correction: the **worst** way to keep an attacker from attacking is taking away the target in the first place. Moving your server to a different port will only clean up your logs a little, it only saves you from attackers who are after the lowest-hanging fruits and wouldn't get into a reasonably secure (all security patches applied, no egregiously stupid passwords) system anyway. — Gilles 'SO- stop being evil', Aug 12 '11 at 08:47
Congratulations, you've just invented Security by Obscurity :). And relegated basic safeguards (timed lockouts) to the realm of totally paranoid... — AviD, Aug 14 '11 at 07:09
Someone should write a tool that can discover which port you are running ssh on! ;) Apologies for the snark. — Bradley Kreider, Sep 28 '11 at 21:04
@HubertKario Someone should tell you what snark means, or, um, sarcastic. — timuzhti, Nov 21 '15 at 10:43

XKCD #936: Short complex password, or long dictionary passphrase?

22 Answers22

The "correct horse" method

The "troubador" method

Applicability

Linked

Related