63

My assignment required me to find the password for a PowerPoint file (97 - 2003, v. 8.0 - v. 11.0).

I used office2john.py to retrieve the hash, and I removed the file name.

The hash is:

$oldoffice$3*1b085471a28011c5348c5f0b8f29d24e*99294d3ebc790cfc325cca851f56d433*9e3556d0775d0aa198060a815be7be4c58e1fe2a

Then I put the hash in hashcat with the following command:

hashcat64.exe  -m 9800 -a 3  s.hash ?l?l?l?l?l?l?l?l -D 1,2 -w 4

hashcat cracked it and gave me the password, but when I insert the password in PowerPoint it says that the password is wrong (iemuzqau).

Did I do something wrong?

Peter Mortensen
  • 877
  • 5
  • 10
Fabius
  • 681
  • 1
  • 5
  • 9
  • 11
    Could you comment on the accepted answer what’s the solution for you was? Did any of the other passwords worked? – eckes Jun 17 '19 at 09:13
  • 3
    i'm still cracking a working password i'm using the '?l?d_-.:;!' charset – Fabius Jun 17 '19 at 09:31

1 Answers1

96

The password hashes for MS Office 97-2003 are vulnerable to collision attacks. That is, multiple passwords exist that should be able to open the document.

That also means that the password "iemuzqau" is not necessarily the original password that was set by the author. It is just one of the passwords that should be accepted, because it matches the internal scheme to check for the correct password.

For the $3 type hash that you got, the hashcat methods 9810 and 9820 can be used to create password candidates faster than raw brute-force (mode 9800). According to the linked thread that should work by first executing the following command:

hashcat64.exe -m 9810 -w 3 s.hash -o hash.rc4 -a 3 ?b?b?b?b?b

The output will be something like:

 $oldoffice$3*1b085471a28011c5348c5f0b8f29d24e*99294d3ebc790cfc325cca851f56d433*9e3556d0775d0aa198060a815be7be4c58e1fe2a:5ffd0b24bd

Then you take the output of that command and execute:

hashcat64.exe -m 9820 -w 3 hash.rc4 -a 3 ?l?l?l?l?l?l?l?l?l?l --increment

This will then produce the following output:

$oldoffice$3*1b085471a28011c5348c5f0b8f29d24e*99294d3ebc790cfc325cca851f56d433*9e3556d0775d0aa198060a815be7be4c58e1fe2a:5ffd0b24bd:iemuzqau
$oldoffice$3*1b085471a28011c5348c5f0b8f29d24e*99294d3ebc790cfc325cca851f56d433*9e3556d0775d0aa198060a815be7be4c58e1fe2a:5ffd0b24bd:cvsfjkwoa
$oldoffice$3*1b085471a28011c5348c5f0b8f29d24e*99294d3ebc790cfc325cca851f56d433*9e3556d0775d0aa198060a815be7be4c58e1fe2a:5ffd0b24bd:yrmbatnya
$oldoffice$3*1b085471a28011c5348c5f0b8f29d24e*99294d3ebc790cfc325cca851f56d433*9e3556d0775d0aa198060a815be7be4c58e1fe2a:5ffd0b24bd:mzvmxmyke
...

The mode 9820 is a mode that "keeps cracking". That is, it will not stop outputting valid passwords after the first match. This behaviour was changed recently so you might have to specify --keep-guessing on your command line, depending on the version that you use.

That does not explain why your password is not accepted by PowerPoint as only valid candidates should be generated by hashcat. But maybe you can use the described workflow to generate additional valid passwords and try them.

Michael
  • 2,391
  • 2
  • 19
  • 36
Denis
  • 3,653
  • 2
  • 17
  • 16
  • 33
    I must be missing something, but shouldn’t a collision work the same as the original password? – nadavvadan Jun 17 '19 at 20:26
  • 44
    @nadavvadan: Not sure if PPT works that way, but in general, no. If the software encrypts the whole file with the original password, and stores the hash just for verification purposes, then it'll accept anything that matches the hash, but decrypting the file with anything but the real password will yield unreadable garbage. – Guntram Blohm Jun 17 '19 at 21:58
  • 33
    I was missing the obvious - there’s a hash collision, but there’s also encryption involved which obviously uses a different algorithm; the collision only applies for the hash function. – nadavvadan Jun 18 '19 at 04:09
  • 9
    Ok, so if this is the case, what's the point of the hash at all? Couldn't it just go straight to decrypting and decide (i.e. depending on valid header data) if the password was right or not? I mean, this appears to be the case if a valid password for a given hash does not work. So why's there a hash in the first place? – Num Lock Jun 18 '19 at 04:59
  • 4
    @NumLock It's just a quicker and easier way to tell if the password is valid or not. It's not essential. – David Schwartz Jun 18 '19 at 05:09
  • 1
    @NumLock It's 'look before you leap', to know earlier that the password isn't going to work. I don't know whether the increase in speed is noticeable, but in magnitude it's quite significant. – Mast Jun 18 '19 at 06:27
  • 23
    @NumLock The most important reason is probably UX. Relying on looking at whether your decrypted data looks valid or corrupt to tell if you were given a correct password makes it impossible to distinguish between the user providing an incorrect password and providing a correct password to a corrupt or truncated file. Consequently, it means that you can't tell the user which of those things has happened; you just have to report to them that *either* they provided the wrong password *or* the file is corrupt, which is a crappy experience for the user. Office's approach solves that problem. – Mark Amery Jun 18 '19 at 11:35
  • 2
    @Mast given how big office files can get the speedup could be quite significant. PPTs produced by naive users can easily reach hundreds of MB, without even including video. – Chris H Jun 18 '19 at 13:30
  • 1
    @ChrisH It would be trivial to make file size irrelevant: decrypt a fixed-sized chunk at the beginning of the file, and check it for a known value (the decrypted data probably begins with standard header information anyway). Decrypting the rest of the file would only be attempted if that check passed, and the decrypted result would be checked for validity the same way as for an unencrypted file. This is similar to checking a separate hash, but reduces the scope for vulnerabilities caused by the hash being weaker than the encryption, as appears to be the case here. – IMSoP Jun 18 '19 at 13:55
  • 1
    @IMSoP that's another means to the same (usability) end, or would have been when the encryption was defined some years ago. You'd presumably decrypt one block. Office .???x files are actually zip files; The first 6--10 bytes of [the header](https://en.wikipedia.org/wiki/Zip_(file_format)#File_headers) could be fixed and used directly for testing, or the "extra data" field could hold a constant used for testing decryption. – Chris H Jun 18 '19 at 14:54
  • 2
    @ChrisH Indeed. I imagine the "BIFF" used in previous Office versions also has a fixed or predictable header, or could easily have an additional block for the purpose, since it's also a container format of sorts. – IMSoP Jun 18 '19 at 15:39
  • 1
    @ChrisH Hundreds of MB? I clearly haven't used Powerpoint enough lately. – Mast Jun 18 '19 at 16:14
  • 1
    @Mast lots of full res photos, graphs imported as png because there's no support for svg... That's without even video. The first I normally hear about it is when they can't email me the file for review (I use LaTeX/beamer myself) – Chris H Jun 18 '19 at 16:28
  • @ChrisH : PowerPoint (and Office) 2003 and earlier did not use .???x files. .???x files were introduced in Office 2007. The (40-bit RC4) encryption implemented in OP's question preceded .???x files by at least a decade. – Eric Towers Jun 19 '19 at 15:34
  • @EricTowers good point. By the time I came back to reply to IMSoP's comment I'd forgotten that the Q was about such an old version (despite having done a lot of work in Office back then) – Chris H Jun 19 '19 at 15:39