Most people that use passphrases, use passphrases wrong.
The remark that Diceware is better probably comes from the fact that, when people use passphrases, they usually take a well-known or otherwise logically structured sentence and use that. "Mary had a little lamb" is a terrible passphrase because it is one of a few billion well-known phrases that a computer can run through in a short amount of time. I know this works pretty well because I tried it.
Diceware is just random words. It's as good as any other randomly generated set of words, assuming you use a good source of randomness: for Diceware, you should use dice, which is a reasonably good source. Digital password generators are usually also good, though homebrew implementations might use an insecure random generator by mistake.
We know that any random passphrase is good because it's basic math. There are two properties to a passphrase:
- Dictionary size
- Number of words in the phrase
The 'randomness' of a passphrase is simple to calculate: dictionary_size ^ words_in_phrase, where ^ is exponentiation. A passphrase of 3 words with a dictionary of 8000 words is 8000^3= 512 billion possible phrases. So an attacker, when guessing the phrase, would have to try 256 billion phrases (on average) before s/he gets it right. To compare with a password of similar strength: a random password using 7 characters, consisting of a-z and A-Z, has a "dictionary size" of 52 (26 + 26) and a "number of words" of 7, making 52^7= ~1028 billion possible passwords. It is well-known that 7 characters is pretty insecure, even when randomly generated.
For randomness, it's the more the better up until about 128 bits of entropy. A little more than that helps buffer against cryptographic weakenings of algorithms, but really, you don't want to memorize 128 bits of entropy anyway. Let's say we want to go for 80 bits of entropy, which is a good compromise for almost anything.
To convert "number of possible values" to "bits of entropy", we need to use this formula: log(n)/log(2), where n is the number of possible values. So if you have 26 possible values (1 letter), that would be log(26)/log(2)= ~4.7 bits of entropy. That makes sense because you need 5 bits to store a letter: the number 26 is 11010 in binary.
A dictionary of 8000 words needs about 7 words to get above the desired 80 bits:
log(8000^7)/log(2)= ~90.8 bits of entropy. Six words would be:
log(8000^6)/log(2)= ~77.8 bits of entropy.
A large dictionary helps a lot, compared to the relatively small Diceware dictionary of 7776 words. The Oxford English Dictionary has 600k words. With that many words, a phrase of four randomly chosen words is almost enough:
log(600 000^4)/log(2)= ~76.8 bits of entropy.
But at 600 thousand words, that includes very obscure and long words. A dictionary with words that you can reasonably remember might have a hundred thousand or so. Instead of the seven words that we need with Diceware, we need five words in our phrase when selecting randomly from a dictionary of 100k words:
log(100 000^5)/log(2)= ~83.0 bits of entropy.
Adding one more word to your phrase helps more than adding ten thousand words to your dictionary, so length beats complexity, but a good solution balances the two. Diceware seems a little small to me, but perhaps they tested with different sizes and found this to be a good balance. I am not a linguist :).
Just for comparison, a password (consisting of a-z, A-Z, and 0-9) needs 14 characters to reach the same strength: log(62^14)/log(2)= ~83.4 bits of entropy.