0

Lately I've been doing the trend of using passwords that are sentences, at least on sites that have a reasonable maximum character length and allow spaces. So for example the most basic variation is 'This is my password.', or 'This is my <insert website here> password.' It seemed rather silly to memorize a group of random words instead of simply rather constructing a sentence, which is way easier on muscle memory. Is this easier to predict?

user54196
  • 9
  • 2
  • @BadSkillz No: XKCD #936 is about a passphrase composed of random words, not a passphrase which is an English sentence. It's good advice on what to use instead of what user54196 proposes but it doesn't answer the question. – Gilles 'SO- stop being evil' Aug 21 '14 at 13:04
  • I'm afraid "This is my password." will become the new "P@ssw0rd" :P The general idea is that your sentence password consists of unrelated words that do not follow the flow of natural language. Adding some entropy to your pass phrase by using numbers and symbols also doesn't hurt. "th1sGroundsQuirrelbrok3n+string" – ilikebeets Aug 21 '14 at 13:14
  • There are ample methodologies to calculate the entropy of any authentication scheme; this question would be stronger if it demonstrated preliminary research. – MCW Aug 21 '14 at 13:59

3 Answers3

2

Let's do some math and study the language to see how this compares to the standard password requirement of 8 characters upper, lower, number and symbols.

Now we are going to be rigorous, so we assume that the attacker knows that you use a sentence as your password and is only trying to break it based on that information. He also knows that you do not use any spaces, capitalization or punctuation in the sentence. I am also intentionally ignoring grammar and syntax since it's too complicated and doesn't provide much insight to an attacker, apart from some words occur primarily at the beginning and or end of a sentence.

While there are a million or so words in the English language people don't use or know that many. The average person uses about 500 unique words in a day, and it is believed that an average person needs between 1500 and 2500 words in their vocabulary to communicate effectively for an extended period of time. Average person also knows about 15,000-20,000 words that they use infrequently.

So let's try those numbers with a 5 word pass-sentence:

Standard to compare is 95^8 = 6.6 x 10^15

500^5 = 3.1 x 10^13

1500^5 = 7.6 x 10^15

2500^5 = 9.8 x 10^16

From this we can conclude that using the expanded vocabulary of the average English speaker we can create a password that is as resilient to specialized attacks as the industry standard is to a brute-force. It would however be significantly more resilient to that same brute force, 26^16 = 4.4 x 10^22. (I used 16 as an average length of 5 words)

Take this all with the same mentality as a normal password. The password "P@55word" has all the features of a secure password but is totally insecure. The same would be true for a pass-sentence like "ilovemygirlfriendmelissa" especially with a little social media recon on the target.

I'm sure some security experts will disagree with this, but then again the security expert quoted in another answer on this page advocates writing down passwords, so I just assume they'll never agree like most experts.

Red_Shadow
  • 177
  • 5
  • Sentences are not just sets of commonly used words but constructs that (loosely) respect a grammar. A latent semantic model would allow you to predict commonly used sentences and to predict what the next words of a sentence might be, dramatically optimising a sentence-based brute-force. You could then use the same permutations as a password cracker to account for 1337speak and common numbers and symbol permutations. – Steve Dodier-Lazaro Aug 21 '14 at 14:10
  • Besides the "password alphabet" and general English alphabet differ: some words are more popular in passwords and likewise some words/sentences may be more popular in passphrases. Because passphrases are not commonly used, there is no real world dataset of passphrases available to speculate over that and further refine a LSM. – Steve Dodier-Lazaro Aug 21 '14 at 14:11
  • @SteveDL Would not any heuristic or word selection to try to optimize breaking the pass-sentence are cpu and/or gpu cycles that are not being used to launch attacks? No password is unbreakable; If it slows down the attacker, it has done its job. – Red_Shadow Aug 21 '14 at 14:25
  • my point is there is a way for attackers to heavily optimise how they browse the passphrase space based on latent semantic analysis techniques (commonly used in many natural language processing problems) so we can't use entropy to discuss the security of a non-random passphrase (for the matter, of **any** non-random authentication factor). Entropy requires true randomness. – Steve Dodier-Lazaro Aug 21 '14 at 14:28
  • Also note a latent semantic model is built offline prior to attacking, so its computational cost scales very well when bruteforcing passwords later on. – Steve Dodier-Lazaro Aug 21 '14 at 14:31
  • @SteveDL The same could be said of the standard password. We cannot really use entropy for those either, but we do. I'm not claiming 100% accuracy, just comparable within a margin of error. If anything, much more research has been done to optimize the breaking of single word passwords than any other types. – Red_Shadow Aug 21 '14 at 14:41
  • Also, in a black-box attack or a break in that steals a large number of hashed passwords your argument falls apart since most attackers would not optimize for passphrases. – Red_Shadow Aug 21 '14 at 14:45
  • You're right about password data. The main reason why we can't discuss passphrase security is because we can't model how they would be created by users, as opposed to passwords where real-world datasets are available. This is discussed in one of the duplicate questions. – Steve Dodier-Lazaro Aug 21 '14 at 14:56
  • "so we assume that the attacker knows that you use a sentence as your password and is only trying to break it based on that information" -> you discuss the security of an auth factor assuming it's commonly/exclusively used, because you don't want to get security just because of low adoption rates. ;) – Steve Dodier-Lazaro Aug 21 '14 at 14:58
  • We have to assume that to get meaningful numbers, otherwise the entropy values are for a very long password which would artificially inflate them compared to the standard. – Red_Shadow Aug 21 '14 at 15:03
  • What I meant is it's not fair to make this (sound) assumption but then tell me that I can't make it in my own approach ;) – Steve Dodier-Lazaro Aug 21 '14 at 15:14
0

Lately I've been doing the trend of using passwords that are sentences

We call them passphrases ;-)

at least on sites that have a reasonable maximum character length and allow spaces

All websites should allow that. Contact an admin and refuse to create an account if it doesn't. They should store the passwords in a fixed-length format (the technique is called "hashing") which makes that, for them, there is no difference between a 200 word long passphrase and a 3 character password. Both would be converted to a fixed-length piece of data by hashing. The fact that certain characters are disallowed indicates that it does matter what password you enter and, though it might still be fine, it stinks of bad password storage techniques.

the most basic variation is 'This is my password.', or 'This is my <insert website here> password.'

Okay, but please do remember that those are just that: "basic variations". They are not secure and should be used for useless, throwaway accounts only (like a temporary one to view an article, or to download something from Oracle, or you name it).

Passphrases work best when they are composed of random words. We, as humans, are bad at randomness and are unable to pick a random item from our entire vocabulary. It would be better to take a paper dictionary and roll dice to find a random page and a random word on that page.

If you don't have a dictionary handy, trying to make up random phrases with illogical constructions (e.g. "potato red red twitter mind") is a reasonable substitute.

You should be distrustful of online services offering to generate passwords for you. Even if they don't, they could technically store the password that it generated for you, or pick the "random" password from a list of only a few hundred possibilities (and yes, a few hundred is very few because computers will run through all those possibilities in a very short time).

It seemed rather silly to memorize a group of random words instead of simply rather constructing a sentence, which is way easier on muscle memory. Is this easier to predict?

It seems I've already answered this, but to reiterate: Yes, sentences are easier to predict than random words. The point of a password or passphrase is to be something only you know. Using phrases that make sense to us limits the number of possibilities, making it easier to guess. It's simply more probable that the word "red" follows after "the" and not after "potato" ("the red something" vs "potato the").

Luc
  • 31,973
  • 8
  • 71
  • 135
-1

According to Bruce Schneier:

This is why the oft-cited XKCD scheme for generating passwords -- string together individual words like "correcthorsebatterystaple" -- is no longer good advice. The password crackers are on to this trick.

The XKCD scheme is using long passwords that are composed from random words. What's true about sentences built from random words is doubly so about actual sentences.

MaltAlex
  • 107
  • 3
  • 2
    This is both wrong ([Bruce was drunk?](https://security.stackexchange.com/questions/62832/is-the-oft-cited-xkcd-scheme-no-longer-good-advice/62881#62881)), and irrelevant here since the XKCD scheme is about random words, not an English sentence. – Gilles 'SO- stop being evil' Aug 21 '14 at 13:05
  • 1
    What's true about sentences built from random words is doubly so about actual sentences. – MaltAlex Aug 21 '14 at 13:08
  • @Malt: Only it’s not true about random words, and actual sentences tend to be much longer. – Ry- Aug 21 '14 at 15:58