Intro
From XKCD #936: Short complex password, or long dictionary passphrase?,
After some search for a tool that generate random pass phrases, I've initiate my own...
Take already present dictionary on my desk: /usr/share/dict/american-english
and see:
wc -l /usr/share/dict/american-english
98569
After a quick look, I see many 's
termination and names that begin with a capitalized letter.
sed -ne '/^[a-z]\{4,99\}$/p' /usr/share/dict/american-english | wc -l
63469
Oh, there is less than 65536, As I can't read only 15.953bit, I will drop this down to 15bits index (using pseudo random as this could be sufficient for now.).
Than with 5 words, I could compute a 75 bits passphrase:
#!/usr/bin/perl -w
use strict;
open my $fh, "</usr/share/dict/american-english" or die;
my @words = map { chomp $_; $_ } grep { /^[a-z]{4,11}$/ } <$fh>;
close $fh;
while (scalar @words > 32768 ) {
my $rndIdx=int( rand(1) * scalar @words );
splice @words, $rndIdx, 1 if $words[$rndIdx]=~/s$/ || int(rand()*3)==2;
}
open $fh, "</dev/random" or die;
$_='';
do { sysread $fh, my $buff, 10; $_.=$buff; } while 10 > length;
$_ = unpack "B80", $_;
s/([01]{15})/print " ".$words[unpack("s",pack("b15",$1))]/eg;
print "\n";
This could produce output like:
value nationally blacktopped prettify celebration
The perl script
I wrote a little perl script, passphrase.pl:
$ ./passphrase.pl -h
Usage: passphrase.pl [-h] [-d dict file] [-i mIn length] [-a mAx length]
[-e entropy bits] [-r random file] [-w words] [-l lines] [lines]
Version: passphrase.pl v1.5 - (2013-07-05 08:34:21).
-h This help.
-l num number of phrases to generate (default: 1)
-w num number of words by phrase (default: 5)
-e bits Entropy bits for each words (default: 15)
-d filename Dictionary file (default: /usr/share/dict/american-english)
-s filename Dict file to save after initial drop (default: none)
-i length Minimal word length (default: 4)
-a length Maximal word length (default: 11)
-r device Random file or generator (default: /dev/urandom)
-q Quietly generate lines without computations.
The default output look like:
With 5 words over 32768 ( 15 entropy bits ) = 1/3.777893e+22 -> 75 bits.
With 5 words from 56947 ( 15.797 entropy bits ) = 1/5.988999e+23 -> 78.987 bits.
3.736 206.819 foggier enforced albatrosses loftiest foursquare
First line show count of uniq word found in dictionary, dropped down to 2^Entropy
. Second line show initial count of uniq word, and compute theorical entropy based on this.
Each output lines begin with two values, The first is the Shanon's entropy and a flat entropy based on number of character in the whole line power 26.
Usage and human entropy reduction
The answer from David Cary confirm that this calcul is very approximative and difficult to represent, but give some good evaluations and a way of thinking:
3 Questions
.1 What's minimal length for one word? Is 4 chars sufficient? How to compute entropy for a 4 letter word?
In plain alphabet a letter is 1/26 -> 4.7bits, but the following letter is generaly a vowel so 1/6 -> 2.5bits!?
If I'm right, a 4 letter word could not represent more than 14.57bits??
.2 Some could try to run this several time to obtain some choice:
The way I use this look like:
passphrase.pl -d /usr/share/dict/american-english -l 5
With 5 words over 32768 ( 15 entropy bits ) = 1/3.777893e+22 -> 75 bits.
With 5 words from 56195 ( 15.778 entropy bits ) = 1/5.603874e+23 -> 78.891 bits.
3.711 211.520 bittersweet damsons snarkiest distillery keyboard
3.894 188.018 outdo caliphs junction uneventful inflexible
3.920 211.520 contemplate gripped capitols plagiarizes obtusely
3.642 155.115 shark procured espied amperage goalie
3.718 150.414 drunken sulked derisory influx smear
and choose in this bunch 5 words with is human sensibility:
bittersweet gripped distillery derisory influx
This will reduce entropy in that:
sexy (and known) words would have more chance to be chosen.
Anyway if excellently detailed, David Cary's answer don't match my question: As I do not make a bunch from a reduced list, I make a random list of 20 to 30 words, then human will make his choice, regarding his knowledge and feeling at this time. If there is 2 unknowns words, human could maybe learn one and drop the other...
From an attacker point of vue, there is near no way to create a reduced dictionary that contain only words known by target user, mostly if I could imagine user is able to learn one word more...
In fine, there a 5 words on a bunch of 63'469 words ('15.953bit' entropy). I don't imagine a way an attacker could reduce this, knowing the fact: 5 words was chosen on a random bunch of 30 words
.
So this will theoricaly drop overall entropy down! In my mind this is negligible, but I can't represent this by numeral argumentation...