Decode MD5, Brute Force

6

The Challenge

Create a program that brute-force* decodes the MD5-hashed string, given here: 92a7c9116fa52eb83cf4c2919599c24a which translates to code-golf

The Rules

  1. You may not use any built-in functions that decode the hash for you if the language you choose has such.
  2. You may not use an online service to decode the hash for you.
  3. This is code-golf, the program that accomplishes the task in the shortest amount of characters wins.

Good luck. May the brute-force be with you.

*Brute-Force definition: looping from a-zzz.. including abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890-=[]\;',./~!@#$%^&*()_+{}|:"`<>? in a string, hashing it, and testing it against the hash given until the correct string that has been hashed has been found. Print the string to the program's output stream.

parzivail

Posted 2014-11-30T16:36:33.637

Reputation: 71

5Interesting challenge. Just have one doubt: Will you hash any character or the input is restricted to a maximum length and/or specific chars? – Ismael Miguel – 2014-11-30T16:41:54.507

@IsmaelMiguel well, typically brute force is a-zzzz.. and along the way it finds the answer, so I'll say stick with general ascii. Sorry if that didn't answer your question, I didn't particularly understand it – parzivail – 2014-11-30T17:38:40.420

1The term "brute force" is not sufficiently precise unless you state the search space, and (for example) it's not equally difficult to search all nine-character strings as to search all strings of increasing length. – Peter Taylor – 2014-11-30T17:41:36.103

If you want from a-zzzz...., I give you a solution in less than 5 seconds. If you want ALL ASCII chars, it will take some time. You forgot to mention the winning criterion (e.g.: shortest number of chars or bytes). – Ismael Miguel – 2014-11-30T17:48:20.937

@IsmaelMiguel a-z, numbers, symbols, etc. Criteria: The Rules #3 – parzivail – 2014-11-30T17:52:25.253

I sense an APL solution with around 12 characters. – Ismael Miguel – 2014-11-30T17:58:56.460

Are we allowed to use an MD5 library function? It isn't clear what "such a task" refers to in rule 1. – MtnViewMark – 2014-11-30T18:23:45.847

@MtnViewMark edited the rules :D – parzivail – 2014-11-30T18:26:08.140

@parzivail - Ah... hmm... well... I don't think any language or library has such a function! It's sort of the point of MD5 hashes that reversing them is prohibitive to the point of not being useful. But okay, so Rule 1 prohibits something that doesn't exist. – MtnViewMark – 2014-11-30T18:30:53.477

1@MtnViewMark you never know. someone'd probably go and look for a language that has one just to make a short script... – parzivail – 2014-11-30T18:34:27.323

@MtnViewMark Thanks for the comments. I took the first rule to imply we had to implement the algorithm in code ourselves! – Iszi – 2014-12-07T08:08:58.653

Answers

5

Haskell, 101 characters

import Data.Hash.MD5
main=interact$ \m->head.filter((==m).md5s.Str)$iterate([' '..'~']:)[]>>=sequence

Use it like so:

& ghc -O2 -o brute 42043-BruteForce.hs
& echo -n c-g | md5
bc098aea57599de26c8f17a9edbd492e
& echo -n bc098aea57599de26c8f17a9edbd492e | ./brute
c-g

This requires the common package MissingH.

MtnViewMark

Posted 2014-11-30T16:36:33.637

Reputation: 4 779

How long does it take to decrypt the given hash value? – Jakube – 2014-11-30T19:35:17.800

1@Jakube - Approximately 282,512 years! I seriously doubt that code here could ever complete the task before the computers they were running on failed! Testing up through all 9 character strings (with a 95 character alphabet) requires ~6x10^17 test cases! The code above compiled -O2, does ~71k tests / second. Clearly, even if you could generate and hash 100x faster, it would still take millenia! – MtnViewMark – 2014-12-01T08:00:47.187

5Also, please note: You can't decrypt a hash value. All you can do is find strings that hash to the same value. – MtnViewMark – 2014-12-01T08:02:02.843

4

PHP (155)

<?$a=[];function r($s){for($c=32;$c<127;$c++){$k=$s+chr($c);md5($k)=="92a7c9116fa52eb83cf4c2919599c24a"?die($k):array_push($a,$k)}r(array_shift($a))}r('');

Ungolfed:

<?
$a=[];
function r($s){
    for($c=32;$c<127;$c++){
        $k=$s+chr($c);
        md5($k)=="92a7c9116fa52eb83cf4c2919599c24a" ? die($k) : array_push($a,$k);
    }
    r(array_shift($a));
}
r('');

What it does is define an empty array $a, which will be our operations-to-do-later array. r is a recursive function and $s is the current string we're working with. We begin with an empty string, and loop through all the characters noted above, which are ASCII 32 to 127, and checking them all. While that happens, all of those are pushed onto the end of the array. Then, when all of the 1-length strings are done and all of them are put into the array, we now call r again with the first element in the array. Then, $s will be set to ASCII 32, and it will loop through 2 character strings that start with that character, checking them all and putting them at the end of the array. Since they are at the end, the next call to r will check ASCII 33 rather than them, since it is now the first one in the array. This prevents an infinite loop from occuring where it checks one ASCII 32 and then 2 ASCII 32's and then 3 ASCII 32's... etc.

Kuilin Li

Posted 2014-11-30T16:36:33.637

Reputation: 421

This code would run out of memory before getting through all five character strings! – MtnViewMark – 2014-12-01T08:07:19.817

1@MtnViewMark You can increase the memory limit. – Ismael Miguel – 2014-12-01T21:07:35.160

2@IsmaelMiguel - Do you have 40GB of RAM? That is how many characters would be needed for get through all 5 character strings. At 6 characters, you'll need 6.5TB of memory... I doubt any system has that much swap space. Storing all 8 character strings, needed before you can get to "code-golf", you'll need 53 PetaBytes of storage. That's almost half the size of the largest storage arrays ever built. – MtnViewMark – 2014-12-02T05:42:55.603

3@MtnViewMark Considering that such a small code needs such a big amount of memory is quite remarkable! I've never seen a code-golf solution being downvoted for being impractical. Give the man an upvote. He had the effort to put such a code. – Ismael Miguel – 2014-12-02T09:26:58.273

@MtnViewMark - Yea, I thought of another way but decided that this way would be the shortest to write. I expected it to take up a lot of memory, but didn't expect it to take that much! – Kuilin Li – 2014-12-02T22:47:45.267

I didn't downvote - honest! – MtnViewMark – 2014-12-09T07:30:55.740

3

Python (182 characters)

import sys,string as c,hashlib as h,itertools as i
m=sys.stdin.read()
j=''.join
print(next(j(v)for n in i.count()for v in i.product(*n*(c.printable,))if h.md5(j(v)).hexdigest()==m))

Example:

$ echo -n Hi! | md5sum
5360706c803a759e3a9f2ca54a651950  -
$ echo -n 5360706c803a759e3a9f2ca54a651950 | python2.7 decode.py
Hi!

The itertools module does most of the work, with .count() responsible for gradual escalation. The nested generator expression is filtered until we get a match on the MD5 hash.

Greg

Posted 2014-11-30T16:36:33.637

Reputation: 491

You can import md5 and use md5.new instead of hashlib.md5. – mbomb007 – 2016-06-16T21:49:31.433

177 bytes – movatica – 2019-09-29T09:10:15.567

0

Perl, 75 + 18 = 93 bytes

This program contains nonprintable characters, so here's a hexdump:

00000000: 245f 3d31 3b73 7c30 2a28 2e29 7c24 263d  $_=1;s|0*(.)|$&=
00000010: 7e79 3d20 2d7e 3d21 2d7e 203d 722e 3178  ~y= -~=!-~ =r.1x
00000020: 2124 317c 6520 7768 696c 6520 6d64 3528  !$1|e while md5(
00000030: 245f 296e 6527 92a7 c911 6fa5 2eb8 3cf4  $_)ne'....o...<.
00000040: c291 9599 c24a 273b 7361 79              .....J';say

This program requires the command line argument -MDigest::MD5=md5 (an 18 byte penalty, 17 for the argument and 1 for the space separating it from the other arguments).

Explanation

Here's the program with whitespace and comments added and the binary string replaced with '…':

$_ = 1;                     # $_ is the string to check; start at '1'
s|0*(.)|                    # replace a string of leading 0s plus 1 character
  $& =~ y= -~=!-~ =r        # by rotating 1 printable ASCII character forward
  . 1 x !$1                 # and appending a 1 if the replaced section ends in 0
|e                          # (the previous two lines are a Perl expression)
    while md5($_) ne '…';   # while it has the wrong md5
say                         # print out the value of $_ we found

The basic trick here is that although we want to check from shortest to longest, there's no particular reason to check the characters in ASCIIbetical order. As a result, we check in the order from 1 to ~ then to 0. To do this, we repeatedly increment the first character after the leading 0s, and reset the leading 0s back to leading 1s (which we can do by incrementing every character up to and including the character after the leading 0s). If the string consists entirely of 0s, we need to append a new 1 to the end; we do this via checking to see if the string consists entirely of 0s via seeing if the last matched character in the regex is 0 (as the 0* match is greedy, this will only happen if there are no non-0 characters in the string).

The reason 1 and 0 are picked as the first and last character in the range is that a) they don't need quoting (the string "1" can be written as 1 because Perl doesn't really distinguish between strings and integers), and b) that Perl has a very short way (!) of checking a single-character string for equality with 0 (Perl has three falsey values: undef, "", and "0", with all other values being truthy).

Note that we have no reason to translate the MD5 hash in question into hexadecimal; it's specified in the question, and so we can pack it into binary ourselves and just compare the binary representations directly. Perl's builtin for producing an md5 hash in binary has a shorter name than the hexadecimal equivalent, and the binary representation takes up half the space in the source, so this is a win in two different ways.

Verification

I tested the program with the hash of the string abc inserted into the source code rather than the hash of the string code-golf, and it found the correct answer in 1.3 seconds. (It's unlikely to be able to reverse the hash of code-golf in a reasonable length of time, due to the need to check a substantial subset of the printable ASCII range.)

user62131

Posted 2014-11-30T16:36:33.637

Reputation:

0

bash - about 105 characters, not the greatest at golfing bash scripts!

Requires makepasswd and openssl.

until [ "$Y" = "$3" ]
do
    X=($(makepasswd --string=$1 --chars=$2))
    Y=($(echo -n $X | md5sum))
done
echo $X

You use it by supplying three arguments: ./reverse-md5 character-set length-of-password md5-hash-to-match, e.g.:

$ ./reverse-md5 'ab' 4 54a8723466e5d487247f3d93d51c66bc
abba

Hoorah, have found the name of an obscure Swedish pop band!

Note that this is a directed attack in that you have some idea of the characters used and the length of the password.

So you could use the script in a directed attack as follows to complete the challenge:

$ ./reverse-md5 'cdefglo-' 9 92a7c9116fa52eb83cf4c2919599c24a
[wait a long time]
code-golf

Or as a brute force attack, run in parallel like this:

$ for L in `seq 1 20` ; do ./reverse-md5 '!"#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~' $L 92a7c9116fa52eb83cf4c2919599c24a & done
[wait a really long time!]
code-golf

Which, depending on the hardware, may give you an answer faster than going sequentially from '!' to '~~~...'

user15259

Posted 2014-11-30T16:36:33.637

Reputation:

1Remove the spaces around the pipe near md5sum, and get rid of the indentation. – RK. – 2015-09-22T18:24:21.867

0

Hassium, 138 Bytes

use Math;func main(){d=["laptop","code-golf"];p="312f91285e048e09bb4aefef23627994";foreach(l in d){if(Math.hash("md5",l)==p)println(l);}}

Expanded: http://HassiumLang.com/Hassium/index.php?code=77bf67650e57549addd25daf7503db06

Jacob Misirian

Posted 2014-11-30T16:36:33.637

Reputation: 737