Passphrase vs starting characters

Question

Passphrases are (if created correctly) a pretty secure option for a password. They are easy to remember and have a high entropy. They have one big disadvantage: It's really easy to have a typo because they are so long.

Another option is to take a slightly longer passphrase but only use the first letter of every word. Therefore you get a shorter password (which is slightly harder to remember because you need a slightly longer passphrase).

Example: "This is a random sentence, I came up with in 5 minutes or so."

=> "Tiars,Icuwi5mos."

Can this be considered to be similar secure to a (slightly shorter) passphrase?

To give some perspective: Let's say we compare a 15-1-characters password with a 7-words/35-characters passphrase.

I question your premise, simply because I'm pretty sure I'd have less typos typing the phrase than the starting letters.... :D — TTT, Nov 30 '16 at 14:27
@TTT that could definitely be the case. I think it comes down to personal preference. Also I'm just curious. — Tim Pohlmann, Nov 30 '16 at 14:28
Yes, it is comparable to a shorter passphrase. I'd say the 15 char password comes out a bit stronger. Either option is fine though, as long as you can remember the password and don't reuse it. — grc, Nov 30 '16 at 14:30
Similar question here, although focuses specifically on using song lyrics https://security.stackexchange.com/questions/111260/is-the-bbc-s-advice-on-choosing-a-password-sensible There was also some research done a decade ago on the security of these systems: http://passwordresearch.com/papers/paper343.html — PwdRsch, Nov 30 '16 at 16:23

score 4 · Accepted Answer · edited Mar 17 '17 at 13:14

Taking your question at face value:

Compare a 15-1-characters password with a 7-words/35-characters passphrase.

Before we can make a meaningful comparison, we need to start with some assumptions.

Can either password be broken with an intelligent algorithm outside of brute force? (More details on this can be found here.) If the answer is no, then obviously a 35 character password is much stronger than a 15 character password. Let's assume the attacker has considered the scenario that your password could contain a passphrase.
Although there are hundreds of thousands of words in the English language, we (probably) only need to try brute-forcing passphrases using the set of the most common words. We'll assume there are 3000 words in that set.
The passphrase will use words that do not necessarily form a grammatically correct sentence. (e.g. correct horse battery staple)
The non passphrase password has the same entropy as a random password and therefore cannot be cracked outside of brute force. (For now, we're ignoring the possible algorithm presented for generating it.)
The password character space is 80 characters. (Obviously this can vary but we need to start somewhere.)

So, some quick math here and we have:

15 char password: 80^15 = 3.5 * 10^28
7 word passphrase: 3000^7 = 2.2 * 10^24

By this calculation a 15 char random password is slightly stronger than a 7 common word passphrase. (A 13 char random password would be about the same strength.)

Notice that assumption #4 is pretty weak, as the first letter algorithm is likely not even close to having the same entropy as a random password, (because you will use less special characters, words have a different distribution of starting letters, etc). Therefore, I think it is easy to conclude a 7 word passphrase is significantly stronger than a 15 character password derived using the method you described, and even more so if at least one of the words in your passphrase is uncommon.

Assumption #4 is certainly wrong, and thus I agree with your conclusion in general. See [Hashcat Markov chains](https://hashcat.net/forum/thread-1710.html) for more detail; a skilled attacker would have more than one statistics file, since the 7 word phrase would have very different statistics than the 15 char phrase... but both would have definite non-random patterns. — Anti-weakpasswords, Jan 15 '18 at 08:07
@Anti-weakpasswords Agreed. I purposely wrote the answer in that way to hit home the point at the end (and make the math much easier). — TTT, Jan 15 '18 at 09:33
At only 4 words, phrases such as `correct horse battery staple` comes in a pretty poor 8.1 * 10^13 — Ed Randall, Feb 19 '19 at 16:37
@EdRandall - I'm not sure I understand your comment. The question is about 7 words, not 4... — TTT, Feb 19 '19 at 16:45
@TTT - nothing mysterious; in your point#3 you only bothered to think of (that example) 4 words, not 7, so the psychology of a real-world scenario suggests that passphrases used are likely to be several orders of magnitude weaker. My non-computer family members certainly wouldn't be going with 7-word phrases. Indeed, many websites often have a limit around 25/28 characters. Most people would therefore probably be much better-off using a fully random generator such as lastpass/1password than bothering with passphrases. — Ed Randall, Feb 20 '19 at 09:57
@EdRandall - got it. I only used that example because it is the most famous example of a passphrase and also doesn't form a sentence. I agree with you. A 4 word passphrase is very weak if the attacker tries to brute force passphrases, and password managers with a generator are far more secure. I'm at the point now that I know hardly any of my passwords, and that's a good thing. — TTT, Feb 20 '19 at 15:08

score 2 · Answer 2 · answered Nov 30 '16 at 14:12

A bit of math background:

A proper brute force attack on your secret will use the most likely secrets first.

You can thus represent the "quality" of a password class (e.g. starting-letter secret with 20 characters vs truely random 8 characters) by calculating the Entropy H of that source S of randomness.

Entropy, in the context of Information Theory is the expectation of Information. With Information I of a random event X (in this case: a particular secret) being

I(X) = - log2(p(X)),

with p being the probability that the event occurred.

Now, for a truly random password N-character of 8-bit characters, each character is as likely, no matter where in the string it happens. That means the expectation value of these is the maximum you can get from an N-character string.

Now, if you used the first-letter method, things change drastically: How many times would you use ? anywhere in a string than at the end? How likely is X as the first letter of your secret, how likely is T? If W was the first letter, how likely is it that the next letter is an i or an a (Considering the first word is then pretty likely to be What/Where/Who, and these interogative pronouns are typically followed by an "is" or "are")? And: if the first letter is not an W, would you as likely pick an i as second letter?

Languages don't use characters with uniform probabilities. So instread of 8 bit entropy per character, you add maybe 4.7 bit of entropy or so – thus, your truly random password with N characters has 8N bits of entropy (or 7N, if you restrict yourself to ASCII), but your first letter thing has much less, especially since the letter probabilities are strongly linked (example with W being likely followed by i or a). So while the first character still might have an entropy of 4.7 bits, the next certainly doesn't – and thus, your source's entropy will be much lower. So, instead of comparing a 8-character truly random to a 12-character first-letters secret, it'd be much fairer to compare it to a first-letter secret of at least twice that length.

Some good points here. How does it compare to a passphrase however? — Tim Pohlmann, Nov 30 '16 at 14:15
That's a question for a summation over a couple hundred binary entries in a significant-length secret, to which I don't have an answer at hand. I'm sure there's plenty of literature on this. — Marcus Müller, Nov 30 '16 at 14:18

Cody P · Answer 3 · 2017-07-18T16:17:25.377

In general, a word is more random than its first letter or a random letter, but letter-based passwords can be an effective method of password generation if the phrase used is sufficiently random or unique.

Let's assume the password cracker knows how your password is generated. Sure, this is a worst-case scenario, but password crackers are surprisingly sophisticated so this is IMHO the best way to judge the true strength of your password.

Picking a phrase off the top of your head is dangerous. Phrases are just as susceptible as other types of passwords to being shockingly easy to guess if you pick the first thing that comes to mind, and as shown below, even randomly-chosen natural English phrases are only half as complex as randomly chosen words. Any good password generation scheme will have you avoid using the same password as other people. There are phrase passwords like "trustno1", "iloveyou", or "letmein" that are extremely common. This paper and this one both show why picking phrases off the top of your head often goes poorly. In their research a large number of these passwords can be cracked using just a few thousand or million phrases. Secure password generation schemes randomly pick one of quadrillions or more possibilities, not just the most convenient of several thousand possibilities.

Password complexity is typically judged using entropy, measured in bits. If all possibilities are equally likely, a password with an entropy of 50 bits will take up to 2^50 tries to guess. We'll use the estimates for entropy explained below to compare some schemes:

For 15 characters,

If chosen randomly from the 26 letters of the alphabet, numbers, periods, and commas, you get a complexity of 5.2*15 = 78 bits
If chosen randomly from the 26 letters of the alphabet, you get a complexity of 4.7*15 = 70.5 bits
If chosen from the first letters of random words, you get a complexity of 4.17*15 = 63 bits

In comparison, for 7 words:

If chosen based on one of the first phrases that comes to mind: don't do this. At best you'd get maybe 40 bits of entropy, guessing based on the papers referenced above. At worst you'd pick "letmein" and your account could be hacked even without using cracking programs.
If we choose a 7-word phrase randomly from a large body of text: between 39 and 52 bits
If we chose words randomly from a small dictionary of 1000 words (which would cover 79% of an average text): 70 bits of entropy
If we chose words randomly from a large dictionary of 8000 words (covering 96.3% of a text): 91 bits of entropy

So in summary, 15 characters is usually more secure than a 7-word phrase, but it depends on how you generate each, and assumes you're not choosing something common. Additionally, some researchers like C. Kuo feel that mnemonic passowrds like these are good ideas. She said, "Mnemonic phrase-based passwords are not as strong as people may believe, but that does not mean that we should refrain from using them… the space of possible phrases is extraordinarily large, and building a comprehensive dictionary is not a trivial task. There are also more permutations that can be made on mnemonic phrases [e.g. love can become "<3"], increasing the size of the search space. It may be possible to crack a significant percentage of mnemonic passwords in theory — but this is different from today's reality."

_{How we get these entropy figures:}

_{Estimating the entropy of natural English is tricky because it depends not only on the body of text you're using as a baseline, but it also depends on how you count word, punctuation, and spacing, as well as what your method of statistical analysis is. Estimates for the entropy of a single character can be as low as 1.25 and as high as 1.77 (based on research summarized here). One estimate I saw for entire words in a phrase was 5.97 bits of entropy, which seems low based on the fact that each word is on between 4.25 to 5.1 characters long. We'll assume based on this information that the entropy is between 5.5 to 7.5 bits per word when using randomly chosen phrases.}

_{A random character from the 26-character alphabet has an entropy of -log2(1/26), or 4.7 bits/character. Throwing in commas, periods, and ten numbers would raise this to -log2(1/38) = 5.2 bits/character. The first letters of a word might appear significantly less random, but looking at their frequency (assuming each word is independently chosen), the first letters have an entropy of 4.1 bits/character.}

_{Note: Applying these entropy statistics from general English to a password search space is not a perfect application and is further discussed here.}

Luis Casillas · Answer 4 · 2017-07-18T02:39:00.350

The starting characters idea is not good. The problem is that the initial letters of English words are not equiprobable. Note that the Project Gutenberg data in the article's table is the frequency of initial letters of word tokens in English texts (and not, say, frequencies of first letters of dictionary entries). So this is, prima facie, a reasonable approximation to the ordinary English passphrases that you are contemplating.

The min-entropy of the distribution in that table is about 2.6 bits (-log2(0.16671)), which gives us one estimate the worst-case entropy/character for passwords generated according to your method. A password generated by taking the initial letters of an English sentence with 15 words, therefore, should not be expected to have much more than 39 bits of min-entropy.

And that's an optimistic estimate anyway, because we haven't tried anything remotely sophisticated, like conditioning the initial letter probabilities on those of the preceding words' initial letters.

Conclusion: your proposal doesn't sound very great.

Passphrase vs starting characters

4 Answers4

Linked