I have been thinking about how to generate random passphrases from a public dictionary of words (similar to XKCD/Diceware passphrases).
One thing in particular I was thinking about is that the length of such a passphrase will leak a lot of information about the phrase (assuming the dictionary contains words of varying length, like most diceware lists seem to do). Say I have a list of 1024 words then generating a random five word phrase should provide 50 bits of entropy if the length is hidden. However, say the length is not hidden and there is only 128 words of length three (and none of length one or two) in the dictionary. Now say we know a passphrase is 15 characters. Then a five word passphrase of length 15 could only be produced from those 128 words of length three giving a much lower entropy of 35 bits.
I am wondering if this loss of entropy is something I should worry about.
Particularly I am interested in whether or not is it fair to assume that someone breaking a passphrase does not know the length of the phrase? Put an other way is it reasonable to assume that in most common systems the length of the phrase is hidden to a potential attacker?
If not then does passphrase generators take this in to account somehow?
I should add that I ask because I am not so familiar with how passwords/phrases are protected. However, I assume they are often sent to a server in some encrypted form, and as far as I am aware encryption does not necessarily protect the length of the plaintext.
This is not similar to questions about revealing password length. This is because each character in a password is of the same length (namely 1). In a passphrase, however, the equivalent of a character is a word from the dictionary. Assuming these words have different lengths, the length of the entire passphrase will reveal what types of words where used. In the example above a password of length 15 reveals that only words of length 3 where used. For a password this is equivalent to to something like revealing that only the letters a, b, c, d, e, f, g, h, i and j where used in generating the password.
I also read the question about the security of XKCD style passwords, but as far as I can see none of the answers deal with this issue.