6

I'm trying to find a way to crack XKCD-like passphrases (correcthorsebatterystaple) from word dictionaries. Basically concatenate X number of words from a dictionary file. Right now it honestly doesn't look like there is an easy way.

After the XKCD comic there were plenty of tools like this made, and I could always spin up my own and create a dictionary file, but I feel like there should be a better way. My specific use is John the Ripper on MD5 hashs.

It doesn't look like there is a way to concatenate entire words in JtR, only modify existing ones. Reference HERE at the bottom

A multi-word "passphrase" cracking mode, or an enhancement to the wordlist mode, might be added in a future version of JtR.

This was 2008, but I still don't see a reference to anything like this changing. I could use the technique described in that document, but it gets real ugly if you need to do more than 2 word phrases or if your dictionary is of any significant length.

My second idea was to use Crunch to pipe in the words, for example...crunch 0 0 -p word1 word2 | john --pipe mypasswd* I could modify that a bit and use -q wordlist.txt and crunch would use the words from that text file. The problem here is that there is no way (that I have found) to limit the number of words used for a passphrase. For example, if your dictionary contained 1,000 words, each passphrase would be a concatenation of all 1,000 words. Again, it gets real ugly if your dictionary is of any significant length.

Edit: Note here for people that suggest changing the above crunch command to specify min length and max length. This does not work with the -p or -q option, however, the numbers must still be specified (thus the 0 placeholders). reference under -p flag

and it ignores min and max length however you must still specify two numbers.

That leaves my requirements at something that writes to stdout, not a file due to the size such a file would be, and allows you to specify the number of words to join (2 word phrases, 3 word, etc). Preferably such a tool would also allow you to specify separating characters (correct.horse.battery.staple correct|horse|battery|staple) in case other characters or even spaces are allowed.

Hope this is the correct stack exchange, someone let me know if there is another I should try.


Edit

For anyone else out there looking for this same kind of thing, here's a python code snippit that does more or less what I want.

# iterable=['test','correct','horse','battery','staple']
## Change the file specified here to your dictionary file
iterable = [i.strip().split() for i in open("wordList.txt").readlines()]

def permutations(iterable, r=None, s=''):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        #return
        r = n # Lets assume users want all permutations
    indices = range(n)
    cycles = range(n, n-r, -1)
    temp = tuple(pool[i] for i in indices[:r])
    for item in temp: # iterate through the current tuple and turn it into a string, to get rid of brackets and such
        print s.join([str(i[0]) for i in temp])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                temp = tuple(pool[i] for i in indices[:r])
                for item in temp:
                    print s.join([str(i[0]) for i in temp])
                break
        else:
            return

# The first value relates to our variable (dictionary file) at the top
# The second value is the number of words we want to combine, a 2 would indicate 2 
## word combinations, such as correct.horse, 3 would be correct.horse.staple
# The third value is the seperater, such as . in the example above, to create words
## that run together enter nothing, or ''
permutations(iterable, 2, '.')

To use this with JtR you would use python xkcd.py | john --pipe mypasswd*

The code was taken from python's itertools so it should return...

r-length tuples, all possible orderings, no repeated elements

I wanted all this, plus it doesn't store the array in memory (it would run out quick with a long list) and doesn't write to disk (although you can redirect the output if you want).

Now, I have run into errors (IOError: [Errno 32] Broken pipe) with long runs and JtR. The code is sloppy, etc. So no, this isn't a perfect real world solution. However, as has been pointed out, this may not be practical in the real world even without errors, due to the number of possibilities. Sometimes I just want to know if something is possible though!

If anyone out there watching this knows C and wants to add something like this directly to JtR or Crunch that would be amazing, I have the feeling that would speed things up and make this much more reliable if it was written into a program designed for this kind of task.

BeanBagKing
  • 161
  • 4
  • You may want to clarify in your first paragraph that you're trying to _crack_ these type of passwords and not just generate them for use. – PwdRsch Sep 10 '14 at 17:25
  • I'm confused, you want a tool that already does this or you want to know how to programmatically concatenate strings? – RoraΖ Sep 10 '14 at 17:37
  • @raz I'm looking for a too that does this preferable, I could roll my own, it just seems like a waste of effort since this is something I imagine people already need. – BeanBagKing Sep 10 '14 at 17:53
  • Interestingly, this is exactly the same question I just came here to ask. Thanks for beating me to it! :) – Jules Sep 10 '14 at 23:04

2 Answers2

3

I did this solution using BASH. It's kind of fun actually... you do have to have an output file already in existence. Since you're scripting adding a touch output.txt command shouldn't be too hard.

shuf -n5 /usr/share/dict/words | tr -d '\n' >> output.txt; sed -i -e '$a\' output.txt

So what it does is

  1. Take 5 random words from the Unix dictionary file
  2. Throw that through tr (translate characters) to remove the new lines, and append the output to an output file. This can also be the step to insert separating characters.
  3. Use sed to add a newline to the end of the file

Script that into a loop and you'll have a file of permutations of 5 words concatenated together. It can be modified if you really want everything output to stdout.

Using JtR for a single password attempt:

shuf -n5 /usr/share/dict/words | tr -d '\n' | john --pipe mypasswd*

SHUF (man shuf)

shuf -n5 /usr/share/dict/words;

Generates random permutations of input lines to standard output.

Translate Characters (man tr)

tr -d '\n' >> output.txt

Translate, squeeze, and/or delete characters from standard input, writing to standard output.

We use this to delete the newline characters produced by SHUF, and concatenate the result into an output file.

RoraΖ
  • 12,317
  • 4
  • 51
  • 83
  • 1
    One problem with this approach is that there's no guarantee your random selection of words will be unique from other entries already in the output.txt file. Since he's trying to optimize a password cracking approach he would probably want to avoid this. There's also the risk that certain word combinations wouldn't be produced unless he ran the script enough times. He wants efficient, but comprehensive coverage of the possibilities. – PwdRsch Sep 10 '14 at 18:53
  • 1
    It seems to me he's asking for a way to generate passwords to use with JtR. De-duping and coverage are the problems of generating passwords of any format. It would be up to him/her to determine how these problems should be solved for their application. – RoraΖ Sep 10 '14 at 19:16
  • Upvoted, because this is sort of an answer, but PwdRsch is correct. This is way too random and unoptomized for a password cracking approach unless I'm missing something. I could loop it and try to add some intelligence to it's permutations to prevent duplicates/increase coverage, etc. But by then I would probably be better off creating a script from scratch. shuf is an interesting command though, I've never seen that and I'm sure it will come in handy in the future. TIL! I'm sure it will come in handy in the future. – BeanBagKing Sep 11 '14 at 01:42
1

There is only a hint at this answer that tool(s) exist. Your remarks:
"but it gets real ugly if you need to do more than 2 word phrases [....]" and
"Again, it gets real ugly if your dictionary is of any significant length"
make me wonder whether you estimated the time to crack if the tool would exists and even if it would perform at a rate comparable to password cracking on GPUs.

(Following taken from my passphrase analysis tool). Suppose the passphrase is:
wadi attack overt wire
and supposing a moderate length Diceware-like dictionary with 7776 words:

  • When used as WiFi key, the passphrase could be recovered off-line in 1.2 centuries on average.
    Assumed recovery hardware etc.: WiFi, 8 GPUs,WPA/WPA2
  • When sniffed as a NTLM-password on a Windows network, the phrase can be recovered in 4.0 hours on average.
    Assumed recovery hardware etc.: Fast hash/Prof Hw, 25 GPUs
  • When used as WiFi key, agencies employing a GPU-array with 128 GPUs, could recover it in: 7.5 year on average
    Assumed hardware: 128 GPU array, '8 to 128 extrapolation estimation'.

When doing the recovery on a CPU in stead of a GPU(s), it takes perhaps a factor 50*(num of GPUs) longer. It seems to me that a CPU approach is not feasible.

Dick99999
  • 525
  • 5
  • 8
  • BeanBagKing doesn't specify what he's cracking, but checking something on the order of 10 million keys per second on a single GPU is quite plausible for many types of password. In that case, the expected time to crack with a single moderately-priced PC is about 5 years, using your assumptions. Yes, a CPU-based attack is unlikely to be feasible for a 4-word passphrase (although for a 3-word passphrase it might only take months). – Jules Sep 10 '14 at 23:26
  • What Jules said, and also because I like learning things and I realized today that an "XKCD" version of Crunch didn't seem to exist, and that made me scratch my head as to why. – BeanBagKing Sep 11 '14 at 01:44
  • @jules Indeed, more details like hash method(s), dictionary length en number of words would make estimations more relevant than an estimate based on "dictionary is of any significant length" and "more than 2 word phrases" – Dick99999 Sep 11 '14 at 13:55