Composition of passwords

Question

Passwords (passphrases) that are taken from known texts (books etc.) are commonly considered to be weak, but what if they are employed in combination with random stuffs, i.e. in forms of AX, XA or AXB, where X is random and A and B are from known texts? Intuitively A and B should contribute something nontrivial to the security in addition to that of X. But how much in relation to X? Are there good results of study in that direction?

There are already some goods questions under password tag you should review them also — Ali Ahmad, Jun 06 '13 at 11:18

score 5 · Answer 1 · answered Jun 06 '13 at 11:16

5

Entropy, expressed in bits, is a logarithmic scale, so it adds up. If A offers 10 bits of entropy (meaning: an attacker would have to try about 2⁹ values on average to guess it) and B offers 11 bits of entropy (2¹⁰ guesses), then AB has entropy 21 bits (10 + 11). Similarly for "random stuff" as X. This holds as long as all the A and B and X elements are selected independently of each other (this is important).

Of course, user's reluctance to use the password also adds up, proportionally to the password length. The user has to remember it and type it, so he prefers short passwords, all other things being equal.

answered Jun 06 '13 at 11:16

Thomas Pornin

320,799
57
780
949

Ok. But how to adequately determine the entropy of a string chosen from e.g. a book in practice? – Mok-Kong Shen Jun 06 '13 at 12:19
Calculating it. Basically, someone would have to take all books and calculate "what are the words that follow this other word in books?". Then the person would have a dictionary to make his brute-force attack. For example, after trying the word "for", he would try all combinations for the random element, then would try only the words he saw after "for" in his attack. Much better then trying all existing words... – woliveirajr Jun 06 '13 at 13:21
@woliveirajr: Are there references about such calculating works that were practically done? – Mok-Kong Shen Jun 06 '13 at 21:10
@Mok-KongShen : don't think so, at least thinking in brute-force attacks. But it's not that hard to construct such partial dictionary, harvesting texts form internet... The brute-force program would also be customized to take advantage of such custom dictionary. – woliveirajr Jun 07 '13 at 13:10

score 2 · Answer 2 · answered Jun 06 '13 at 12:24

The strength of a password is defined as the average amount of time it would take for an attacker to guess it. There are two elements in choosing a password: cleverness and randomness. Using English words, phrases from books, etc. is cleverness.

In practice, pretty much any password choice involves part cleverness and part randomness. For example, if you open your favorite book and take the first four words of a random page as your password, then your choice of book is mostly cleverness, and your choice of page is mostly randomness (but if you flipped pages “at random”, there's actually quite a bit of cleverness, because you're far more likely to pick, say, page 42 out of 100 than page 1 or page 100).

If you assume that the attacker is stupid, then your password is only protection against stupid attackers. Clever password choices do not protect against clever attackers. The amount of randomness is a good measure of the strength of the password, cleverness doesn't help much.

The attacker has to guess the whole password. If you combine multiple elements, he has to get all of them right. For example, support your password is of the form AX and you have 50 choices for A (estimate for the first word of a random page in a long book — you'll need far more than 50 pages to account for the not-so-random choice of page and for repeated words) and 36 choices for X (two digits, obtained by rolling a fair die twice). Then there are 50×36=1800 possible passwords. Concatenating multiple elements always increases the strength of the password, and it does so multiplicatively (assuming the elements were chosen independently — if instead of rolling a die for X you had picked the page number where you found the word A, there would be no added strength, as this method is purely clever and not random at all).

The only reason to not completely use randomness to generate passwords is that a purely random string of characters is not memorable, and may be hard or impossible to type. If you want to stick with easy to type, you can stick to random letters; this increases the number of characters, and you can easily measure how (each additional letter multiplies the total number of possibilities by 26). Memorability again requires increasing the length for a given target strength; there are multiple ways to achieve that.

Combining the random X with a clever A isn't much better than using only X. A will have a small amount of randomness, because the attacker might not know which book you picked. The downside of this approach is that it's hard to quantify how strong A is. So if you want to be confident that your password has a certain strength, you'll have to make X pretty much reach this strength, so A is superfluous.

For this reason, it may be better to eschew A altogether. An approach to memorable randomness that works well for many people is to pick several words at random; diceware is a popular method for this.

The usability of passwords has been studied in controlled conditions. It is difficult to study them in the wild, as our real-world data comes from unplanned breaches on systems where users were left to their own device (website X gets hacked, wasn't using proper hashing, someone publishes the password list). If anything, the conclusion is that passwords are bad, but finding a replacement is a research topic.

If one randomly selects a book from a big library and randomly chooses a phrase from it, would that be random and not cleverness and thus could be a viable way? (If yes, how would one determine the entropy? Note that this is in principle the same as diceware, isn't it?) — Mok-Kong Shen, Jun 06 '13 at 22:32
@Mok-KongShen If you choose the book at random, that's part cleverness and part randomness. The cleverness went in assembling the library; if it's in an English-language country, the technique is more likely to yield “he opened the door” than “haloed be her singtime, her eve sung”. To determine the entropy, you'd have to count the frequency of all phrases in your library. Diceware is simpler to evaluate, you have a fixed list of distinct words. — Gilles 'SO- stop being evil', Jun 06 '13 at 23:02
But then why would employing diceware, which has a fixed list of words analogous to the catalogue of a library, apparently mean in your view randomness and not cleverness? I don't yet understand. — Mok-Kong Shen, Jun 07 '13 at 09:17
[Addendum:] Different libraries have fairly different catalogues of books (e.g. special research reports etc.) and are evolving. That could be an essential difficulty facing the opponent who would have to find out which library is involved. Thus the randomness of picking the book and the location of the phrase is IMHO sufficient (real) randomness in the current context. The clear advantage is that the phrases chosen are commonly comparatively easy to remember than a set of unrelated words. — Mok-Kong Shen, Jun 07 '13 at 09:44
@Mok-KongShen Establishing the catalogue is cleverness. If the attacker is targetting you, you can assume . If the attacker knows your email address is `mok-kong.shen@miskatonic.edu` he can make a good guess that you used the Miskatonic University library. The choice of Diceware as a catalog is cleverness, but then the choice of a word in that list is randomness. Choosing from Diceware is safer because it doesn't have repeated words. — Gilles 'SO- stop being evil', Jun 07 '13 at 10:07
Oh, I use normally five public libraries located in my city. Further I can over the library network access other libraries in the country. There are online accesses to hundreds/thousands of journals. How could an opponent ever succeed to guess what I have used to determine the natural language components of my passphrase? Thus I continue to think that's at least better than schemes like that of diceware. Words in diceware don't repeat, but then phrases chosen also don't exactly repeat. (As to entropy I think maybe one could just use the figure that Shannon has once found from an experiment.) — Mok-Kong Shen, Jun 07 '13 at 11:45

score 1 · Answer 3 · answered Jun 08 '13 at 18:25

Before answering your question, let me summarize the steps that experts of password cracking took in a recent competition.

Bruteforce all seven or eight character lower case alphabets
Bruteforce all seven or eight character upper case alphabets
Bruteforce numbers from 1 to 12 digits
Append two character digits or special symbols to the letters in the password dictionary file
Append three character digits or special symbols to the letters in the password dictionary file
Append four character digits to the letter in the password dictionary file
Use the password dictionary file with rules for example change "a" into "@". This is called l33t substitution and case be easily done through John the Ripper
Combine each word in a dictionary (usually a smaller one) with each of the word in another dictionary (usually much larger)
Combine the individual words of each dictionaries and append single and two character digits

So you see above, even if you choose two letter at random, if those two letters are found individually in a dictionary file, it won't stop the attackers from cracking the passwords. As you mentioned, AXB or ABX both are vulnerable to the type of attacks carried out by smart attackers (the last two techniques in the list).

That is why it is very important that not only you choose passwords that are random but also that the server where these passwords are stored take measures to ensure its security as well. Technique 1, 2, and 3 can only be stopped if the password hashes are generated through a very slow hashing algorithm such as bcrypt together with a random salt for each password.

You can find more details about the password cracking techniques employed by the crackers at here

score 0 · Answer 4 · answered Jun 08 '13 at 16:55

Short answer:

Passphrase passwords (e.g. word passwords) can only be considered weak for the same reasons other passwords can be considered weak: too short or nonrandom selection. E.g. "DisneysBeautyAndTheBeast" may be a reasonable length but is not random, especially if I know that you love that movie. "HindranceReproofs" seems random enough but using only two words puts you at risk. Similar things can be said of these character passwords: "qwertyasdf1234567890" and "z3}V".

Long answer:

Check out the "Lengths L of truly randomly generated passwords required to achieve desired a password entropy H for symbol sets containing N symbols" table in the Password strength Wikipedia article. It can give you a feel for how random combinations of various sets of characters compare with random combinations of words picked from the Diceware dictionary, which is reported to have 7776 words.

That article recommends that strong passwords have an entropy of 80. According to the table, it would take 14 "case sensitive alphanumeric" characters to achieve that, or 7 words randomly selected from the Diceware dictionary. Now those 7 words will result in a long password, but consider that remembering 7 things is significantly easier than 14 things, especially when some of those things are uppercase and some are lowercase and you have to remember which is which.

If we follow the XKCD example and suppose that about 40 bits of entropy is acceptable then a truly random "case sensitive alphanumeric" password only need be 7 characters long, and the Diceware password need only be 4 characters long. In this way we can get a feel for how strength vs. length of random character passwords compare with random word passwords.

Hybrid passwords can be calculated too. For example, the entropy of a 3 word (WWW), 2 case sensitive alphanumeric (CC) password can be calculated in the following way if we are choosing from 8000 words and 62 case sensitive alphanumeric characters:

Entropy(WWW) + Entropy(CC)
    = LogBase2(8000*8000*8000) + LogBase2(62*62)
    = 3*LogBase2(8000) + 2*LogBase2(62)
    = 50.8 + 11.6

But we must not forget that when users makeup their own passwords they probably cannot be considered truly random. Be it character passwords or word passwords, you really have to generate passwords for your users to have any confidence they will be random.

Composition of passwords

4 Answers4

Linked