7
1
Hash Un-hashing
The basic goal of this challenge is to exploit the high probability of collision in the Java Hash function to create a list of plausible inputs that created an outputted hash. The program that can print the most results in one hour wins.
For example, hashing golf
produces the number 3178594
. Other 'words' you could hash to get the same number would be: hQNG
, hQMf
, hPlf
Details
The basic Java hash function is defined as:
function hash(text){
h = 0;
for(i : text) {
h = 31*h + toCharCode(i);
}
return h;
}
You can use an altered or more efficient variant of the hash function (such as using bitwise to replace multiplication) if it produces the same results.
If your language of choice does not use signed ints or have integer overflow, you must implement a wrap function in the hash to emulate integer overflow so that 2147483648 would overflow to -2147483648.
You can choose not to worry about inputs that wrap around multiple times, such as those with 6-7 characters or more. If you choose to include those, you have a better chance of winning.
Additional Requirements
Your code should have at least one input: the hash itself. An optional second input is the maximum number of letters to guess. For example,
guessHash(53134)
might produce a list of all input combinations up to the absolute maximum amount (see the next requirement), andguessHash(53134, 4)
will display all input combinations that contain 1-4 letters.As there are an infinite amount of results, the most letters that can be in a single result will be set at 256. If your program can manage to output all 256 letter results in a reasonable amount of time (see next requirement), you automatically win.
Your program should not run for more than an hour on any
non-specialized
computer you use.Non-specialized
means it can't run on a server system using 256 different cores or on a super-computer, it must be something that any regular consumer would/could buy. The shorter time period is for practicality, so that you don't have to use all your processing power for too long.All the results in your final list of inputs must produce the same hash as the inputted hash.
You are to implement your own search, brute-force, or any algorithm to produce the final list.
Using a native search method or search method from another library is not allowed (example: binary search library). Using
String.search(char)
orArray.find(value)
is okay if it is not a primary feature of your search algorithm (such as to spot duplicate results or find if a letter exists in given string).The final list of input values should use all readable ASCII characters (32 to 126).
Example
I went ahead and created a very inefficient version of this challenge to produce an example list! Each result I gave is separated by spaces, although you can print each result on a new line or any other readable method of organization.
i30( i3/G i2NG hQNG hQMf hPlf golf
You do not have to print results to a file, just record how many results you get.
Judging
The number to un-hash is: 69066349
I will pick the best answer based on which-ever code produces the most results. If you remember above where I said wrapping was optional, it'd be a good idea to include multiple wrappings if you want the most results in your final list.
If you have any questions or additional comments, feel free to ask.
Good Luck!
9Great trolling: state that the question is about "Java" and then provide a JavaScript code snippet. – feersum – 2014-09-25T03:46:45.367
You say "this code golf" but the judging is not by character count? – xnor – 2014-09-25T03:47:27.160
Also the question states that characters should be ASCII, but provides a list including many non-ASCII characters. It appears Latin-1 instead of ASCII was desired. – feersum – 2014-09-25T03:49:05.523
@xnor Sorry about that! I'm a tad bit new and haven't picked up on all the terminology in the community! – jocopa3 – 2014-09-25T03:53:02.280
@feersum The code snippet was meant to be as non-language specific as possible, but yeah, it's almost JavaScript. And again, sorry about that, I first had it as the 33-127 ASCII characters, but later switch to include Latin-1 character set without going back and changing the definition. Fixing that. – jocopa3 – 2014-09-25T03:54:01.687
1I don't think you have to worry about a program finding all preimages. With 256 characters out of a pool of 155, each with a probability of
2**-64
of matching the given hash (to give a rough estimate), there are155**256/2**64
preimages. – Dennis – 2014-09-25T04:04:50.3572That's 28773653479426574679752045985983708651314241076486969565840047647896492733218091801018247637865462591153254380909630939358483142387095514874963147201793135934206287462851566531896641573510502833532739091940091000563990339507011267238877057288024527221865244498474278775782301655262770952804980910234712459807545306017935605110278792039014148511645685233861391147234783503715268543704093987748985203419178705457137415361429762778270019610641592268726043290782629279945031794639464412329913307721831669515436033160513633497782473587351945837412 preimages. – Dennis – 2014-09-25T04:05:45.067
@Dennis Shouldn't that be 2^-32 ? And why wouldn't we worry, given that there is no runtime limit, or anything else that would prevent a brute force generation of every possible string... – feersum – 2014-09-25T04:21:49.010
@feersum in the second additional requirement, I did specify that the program can not run for more than a day. I should have probably made it a new requirement or made it stand out instead of hiding it in parenthesis though. – jocopa3 – 2014-09-25T04:35:12.880
@feersum: Yes, don't know where I got 64 bits from... So the number is a few digits bigger. The question specifies a time limit of a day. To find all preimages, you'd have to find (not try) 10**444 of them every second to finish before the universe's estimated death. – Dennis – 2014-09-25T04:36:47.680
What's the rationale behind the limitations on characters and libraries? This seems like a potentially fun challenge, but I don't see the restrictions making it any more fun. – James_pic – 2014-09-25T09:41:50.277
@James_pic For the character sets, I did that because using the readable ASCII characters 32-126 doesn't create many results. I guess it doesn't make much sense either to then switch to Latin-1, but remove a lot of the characters. I'll switch it back to the readable ASCII 32-126.
For libraries, I want people to write a search algorithm on their own. The project is basically implementing a search or brute-force algorithm to find each results, it wouldn't be much fun if people used a preexisting libraries to get results. I will lift some of those restrictions though. – jocopa3 – 2014-09-25T10:14:07.513
What about libraries that have nothing to do with hashing, brute-forcing, or cryptography? For example, if I want to use an obscure data structure that isn't included in my language's standard library. – James_pic – 2014-09-26T08:48:03.157
@James_pic I removed the restriction on most libraries. Use anything as long as you don't use it for searching/brute-force. – jocopa3 – 2014-09-26T10:21:52.717
If your program can manage to output all 256 letter results in a reasonable amount of time (see next requirement), you automatically win.
Looks like its already a tie! – matsjoyce – 2014-09-27T08:46:02.950