19

As long as I know, the encrypted PDF files don't store the decryption password within them, but a hash asociated to this password.

When auditing security, a good attemp to break PDF files passwords is extracting this hash and bruteforcing it, for example using programs like HashCat.

What is the proper method to extract the hash inside a PDF file in order to auditing it with, say, HashCat?

Answers for John the Ripper could be valid too, but I prefer HashCat format due to the easyness of making GPU computing work in Windows and bruteforce with OCLHashCat (the GPU version of HashCat). John the Ripper has a GPU version too, but JTR has no Windows version, at least with GPU enhancement.

1 Answers1

23

UPDATE 21 Dec 2017

The script pdf2john.py doesn't exist anymore. It has been substituted by a perl version, pdf2john.pl.


Extracted from HashCat Forums, this method works for me (requires Perl):

--Download pdf2john.pl from the suite John the Ripper (OCLHashCat works with the same hash format as John the Ripper):

wget https://github.com/magnumripper/JohnTheRipper/archive/bleeding-jumbo.zip  
unzip bleeding-jumbo.zip  

--Use it to extract the hash from your .pdf file:

perl JohnTheRipper-bleeding-jumbo/run/pdf2john.pl MyPDF.pdf > MyPDF-Hash.txt

--Output file MyPDF-Hash.txt must be edited. Original would be something like:

MyPDF.pdf:$pdf$4*4*128*1028*1*16*652fc762fdb12c47a5f90ddd2b99b809*32*dd86d858f914809078a4a47348d32c0fc4e9c08042a10e6434b48b698de7731f*32*3c1e693526d5bc8da15b99eea6cbc6ed2c2397e23e2c39d1974fdc004c588cff:::::MyPDF.pdf

so use your preferred editor:

nano MyPDF-Hash.txt
notepad MyPDF-Hash.txt

and leave only the part inside double colons : :

$pdf$4*4*128*1028*1*16*652fc762fdb12c47a5f90ddd2b99b809*32*dd86d858f914809078a4a47348d32c0fc4e9c08042a10e6434b48b698de7731f*32*3c1e693526d5bc8da15b99eea6cbc6ed2c2397e23e2c39d1974fdc004c588cff

--Hint: you can do the extraction and the edition in one step by using sed (UnxUtils version too, if you are doing it from Windows):

perl JohnTheRipper-bleeding-jumbo/run/pdf2john.pl MyPDF.pdf | sed "s/::.*$//" | sed "s/^.*://" > MyPDF-Hash.txt

--Your MyPDF-Hash.txt file is now ready to use with OCLHashCat (or John the Ripper).

NOTES:

  • Tested working on CygWin (Windows).
  • Tested working on Kali and Ubuntu Linux.
vakus
  • 3,743
  • 3
  • 20
  • 32
  • pdf2john.py doesn't exist anymore. It has been substituted by a [perl version](https://github.com/magnumripper/JohnTheRipper/blob/bleeding-jumbo/run/pdf2john.pl) – tpvasconcelos Dec 21 '17 at 02:59
  • 1
    Copying the perl file out of the directory does not work. It needs to be where it is, in the "run" directory, otherwise you'll get an error. – Eric Brandel Jun 10 '18 at 05:50
  • Hey, this answer doesn't work, first because you have the file extension py and the file is a perl script. – Philippe Delteil Jul 31 '18 at 16:56
  • 1
    To get rid of the irrelevante text on the hash, use this perl JohnTheRipper-bleeding-jumbo/run/pdf2john.pl MyPDF.pdf | awk -F":" '{ print $2}' > MyPDF-Hash.txt – Philippe Delteil Jul 31 '18 at 17:36
  • I suggest an edit to this answer: don't download pdf2john.pl from the repository, just download the whole repository and run pdf2john.pl from within it. – Baodad Jan 31 '19 at 22:02
  • 1
    I must say, this is the first time I've ever seen an application move *from* python *to* perl. I wonder... *why?* – Jonathon Reinhart Jan 04 '20 at 01:09
  • 1
    FYI there is a site that runs that script for you if you don't mind uploading the pdf. https://www.onlinehashcrack.com/tools-pdf-hash-extractor.php – Szabolcs Jul 08 '20 at 09:30