Would turning a Diceware phrase into a sentence decrease its security?

Question

Diceware passphrase lengths are on the rise - up to six or seven words now. The old adage that passphrases are easier to remember may be true for shorter phrases, but six truly random words can be tough to remember. On the other hand, full sentences may be easier for some to remember.

Take for example the Diceware-generated passphrase tracy optic renown acetic sonic kudo. We could turn that into a (nonsensical) sentence such as Tracy's optics were renowned, but her acetic sonic cost her kudos.

The Diceware passphrase has an entropy of 77.4 if the attacker knows you're using six Diceware words (12.9 per word), and 107.219 (according to this calculator) if they don't. The sentence form has an entropy (according to the calculator) of 255.546. However, it's not fully random any more, which is supposed to be one of the big benefits of the Diceware approach.

Assuming that the attacker somehow knows that you're using this method of passphrase generation, does the sentence form decrease the security of the passphrase in any way? For example, perhaps they can use some kind of analysis of English sentence structure to narrow down their required guesses?

Assuming the answer to the above is "No, sentence form does not decrease security," then here's another consideration:

One benefit of the sentence format is that it's very long and includes non-alphabetical characters (eg. the apostrophe and comma). However, that's a definite downside when trying to type it on a mobile device. Say we shorten the Diceware phrase to three words - tracy optic renown - and then turn that into an [a-z] sentence - tracy is optically renowned or perhaps tracy is optically renowned worldwide (to further distinguish it from the Diceware wordlist).

If we were to use three Diceware words and the attacker knows we're using Diceware then we have an entropy of 38.7. However, tracy is optically renowned worldwide is 100.504 bits of entropy according to the calculator.

Given the differences between the three word Diceware phrase and the short sentence form, which entropy calculation is more accurate - the Diceware calculation (ie. the differences are too slight to matter) or the calculator's calculation (dictionary/brute-force/etc.)?

Note: assume that any length or combination of characters is acceptable for the password

Even if the attacker knows you are using diceware password method, it would take even the most well-funded adversaries 70 years to bruteforce, as I have covered [here](http://security.stackexchange.com/questions/111967/does-eliminating-the-possibility-of-repeat-words-make-diceware-passwords-signifi/111972#111972). — cremefraiche, Mar 26 '16 at 20:17
@Bakuriu They are not perfect, but they are better than all the other approaches used to measure the strength of a password. — kasperd, Mar 27 '16 at 14:31
Input entropy is really important, and they can't touch that. — Sobrique, Mar 28 '16 at 07:14

score 28 · Accepted Answer · answered Mar 26 '16 at 20:58

It does not decrease the security. What is actually happening is that your "entropy calculator" is giving you a false measure of entropy. It can only give an approximate estimate, after all. There's actually interesting proofs that show that one can never actually know the amount of entropy in a particular string of text unless you know something about how it was constructed. A pass string 1000 words long created by a "physical random number generator" like a resistor noise network will appear to have the same amount of entropy as a pass string 1000 words long generated using a Mersene Twister, until you realize that the Mersene twister actually leaks all of its seed information in any contiguous block of 624 values. Entropy calculators can only make heuristic assumptions about how random the data actually is.

This, of course, is why we have Diceware. It can prove [an underestimate on] entropy because randomness is built into the process. To prove the security of a pass-sentence like you are looking at, consider an oracle test. I select a bunch of words using Diceware, and then I build a sentence out of them. I then provide you with an oracle which constructs sentences out of them. It is guaranteed that, if you provide the oracle with the correct set of selected words from Diceware, it will provide exactly the sentence I used. For all other sets of words, it will produce an arbitrary sentence using them. It is trivial to see that the entropy of my password cannot possibly be lower than the entropy built into the Diceware words I selected. Even with this immensely powerful oracle to reduce the very human process of sentence formation to nothingness, the randomness from diceware will remain. You cannot guess my password any faster than you could guess the original set of Diceware words I selected.

Now there are a few caveats. If you use fewer diceware words, like your later example, you get fewer bits of entropy from the diceware layer. This means that oracle I mentioned above becomes more and more helpful for breaking the sentence based password. Also, some of the sets of words you get from diceware can be particularly difficult to turn into sentences. If you ever reject a set of diceware words as part of your pass-sentence building process, you are calling into question the perfect randomness that diceware relies on.

Now, why the oracle attack? Oracles are very powerful tools for testing cryptographic theory. In reality, tracy is optically renowned worldwide is actually probably quite a lot stronger than the 38.7 bits from the diceware words tracy optic renown. Breaking that sentence will take more work than the words, though probably not the full 100.504 bits the entropy calculator heuristically estimates. So how much stronger? We don't know. That's the point of oracle attacks. In an oracle attack we say "let's just assume this hard to calculate part of the process offers zero increased security. None at all. Is the process still secure?" If it is secure under this extreme assumption, then it is clearly secure against real life attacks where the attacker doesn't necessarily have such a magically powerful oracle at their disposal.

Good point on the dangers of *rejecting* Diceware word sets for being difficult to weave into a sentence. — LSerni, Mar 26 '16 at 22:01
Another thing to note is that this process is not injective. `tracy optic renown`, `tracy optically renowned`, and `tracy optic worldwide` may both generate `tracy is optically renowned worldwide`. — PyRulez, Mar 26 '16 at 23:21
@PyRulez True, though as an exercise to the reader, I leave it to demonstrate why such a non-injective process does not decrease the security of the sentence in the presence of such an oracle attack. It can even be shown that this security remains true, even if the attacker gets to provide *you* with an oracle of their own which takes the diceware words and turns them into a sentence containing those words in order, no matter how malicious that sentence generating oracle is. — Cort Ammon, Mar 27 '16 at 01:49
@CortAmmon: That's clearly baloney, a malicious oracle CAN decrease the entropy as PyRulez suggested. Consider the formation of a sentence that concatenates ALL words in the diceware dictionary (ok, it'd have to include them all N times in order to meet the "in order" requirement). That same passphrase could be returned for every possible input. That's of course an extremely long output; length restrictions will limit the entropy loss but not eliminate it. — Ben Voigt, Mar 27 '16 at 02:58

score 5 · Answer 2 · answered Mar 27 '16 at 10:36

Assuming you go with whatever words you roll (as opposed to rolling until you find something you can make a sentence out of), and you use them in the order they were rolled (not rearranging them to make a better sentence), this scheme cannot decrease the entropy. It will increase it; but to what extent, is hard to quantify.

Assume the worst case, that the attacker knows you are using this scheme. The entropy of the diceware words is unchanged. If you were using those words alone, the attacker would have to try every tuple of diceware words. But now the attacker has to take each tuple and insert it into one of several possible sentence forms.

The sentence forms that make sense grammatically will vary according to the parts of speech of the diceware words. Some might take multiple parts of speech; "annual" could be an adjective or a noun, for example. It's also possible you might use incorrect grammar on purpose.

So the number of possible passphrases has increased, and so has the entropy. However, since the amount of entropy increase is difficult to quantify, I would assume it's zero, and use as many diceware words as you would use without this scheme. The quantifiable advantage is that the expanded phrase is easier to memorize.

iAdjunct · Answer 3 · 2016-03-26T20:12:41.997

Dice ware has its security in the number of bits of entropy per word. We'll start with the assumption that you've selected the words at random with a particular order.

If you add words in between to turn it into a sentence, it still has the same entropy and is therefore just as secure. (there is a possible issue here if you include your added words in the password you enter in the form on the computer because they may provide hints to somebody who has cracked half the password' if your words are "pizza eat", you could memorize it by "pizza is what I eat," but if you enter that whole thing and somebody figures out "pizza is what I..." Then it's not hard to guess "eat" but if you only type "pizzaeat" then you don't have that issue)

If you re-arrange the words though, you decrease its security because the number of options per word is smaller (because you artificially limit the number of options for each word to the ones which would work well with the following word. For example, you eliminate options like "pizza eat" because they wouldn't make sense, and therefore the number of options for the first word is smaller because "pizza" is no longer one of them).

score 2 · Answer 4 · edited Jun 16 '20 at 09:49

tl;dr:Q1: a little, Q2: about the entropy of the original dice ware

longer version

Purely adding additional characters, even if they are not random, should not harm your pass phrase strength, abstractly.

Yet there are some concerns; most importantly here maybe: an attacker given knowledge about your technique could probably deduce from you typing your pass phrase:

the length of the dice ware you started with
which are the added words to generate a sentence from that (because they are generally short)
which words in which positions are more likely (because grammar; not every word off the dice ware dictionary (depending on the dictionary, of course) can be modified into a word that fits the sentence structure)

This last point reduces the possible combinations you can start with, which the attacker can take into account. From there, it might be possible that part of your pass phrase is already leaked; the words in between follow strict grammatical rules and are thus relatively easy to guess.

This is also the answer to your second question. The entropy you are adding to create a valid sentence is low, as most is syntax with very little room for play. So the calculator is overrating your construction by far.

Still, if I'm not overlooking something here, the overall strength of your modified dice ware result should generally be at par with the original one.

score -3 · Answer 5 · answered Mar 27 '16 at 20:21

-3

The entropy is reduced once the dictionary of the methodology you're using is revealed, compared to the original implementation, which is why if you use an online entropy calculator to calculate it and you do it from a machine that uses the same scheme or from a set of machines that are identifiable with the words used, then your entropy is zero.

answered Mar 27 '16 at 20:21

Munchen

9
1

3

The original diceware dictionary is already public, and it doesn't matter a whit to its security. Am I misunderstanding your meaning? – Ben Mar 28 '16 at 02:31
If you have a dictionary and the pass phrases are chosen from these set of words, then the probability of brute forcing depends on the words. If the person trying to guess the pass phrase doesn't know or have access to a dictionary, then the entropy depends on each byte of each letter position. That's why random words or preferred passphrase schemes are great only if the attacker cannot figure out what they are. Once you insert that into an online entropy calculator and do it from a known machine that uses the same scheme, the entropy is zero i. e. The pass phrase is known exactly. – Munchen Mar 28 '16 at 17:46
2

Are you saying "if the attacker knows which 6 words out of 7776 possible words you chose at random, your passphrase is insecure"? To that, I'd say yes, obviously; but also completely irrelevant to the question. Or do you mean, " if the attacker knows your list of 7776 words then entropy is zero"? That would be very incorrect. – Ben Mar 28 '16 at 19:08
Well, the the number of bits of entropy is calculated from the possibility of combinations, but if you know for a fact the exact combination, then battery horse staple has zero entropy to you as the owner of the password, to paraphrase the xkcd comic, and activities such as password list gathering are meant to reduce the number of possible combinations between the person guessing vs the person. – Munchen Mar 28 '16 at 23:01

Would turning a Diceware phrase into a sentence decrease its security?

5 Answers5

longer version