0

The title describes pretty well what I want to ask, because I just can't wrap my head around it. I understand the basics of breaking encrypted data with the method of kind-of-bruteforcing where you encrypt sample data then compare it to the encryption you want to crack. However I don't understand how cracking a sophisticated encryption works.

I will give this code I posted in another question, with risk because it wasn't accepted very well, but just to illustrate the simplest example

function my_hash($data){
    // Generate random salt
    $salt = substr(md5(str_shuffle('0123456789abcdef')), 0, 5);

    // Mask salt within the hash
    $hash = substr(md5($data. $salt), 0, -5) . $salt;

    return $hash;
}

function my_check($data, $hash){

    $salt = substr($hash, -5);

    return substr(md5($data. $salt), 0, -5) . $salt === $hash;
}

$hash = my_hash('qwerty');

Can you explain to me, if you had a super-computer and billions of strings encrypted with that function, how could you find out the algorithm they were encrypted with?

Edit: The suggested question is not answering the one I asked, it is explaining how to guess the hashing function that was used, not how to crack it and find the algorithm behind it. I can easily say it is md5 when it's 32 bytes long and contains characters 0-9a-f but there's a difference between plain md5 and what I posted above.

php_nub_qq
  • 787
  • 1
  • 6
  • 13

2 Answers2

3

Not encryption
This is not encryption. This is hashing. There is no way back.

So I think what you're really asking about is black box reverse engineering. Like so >>If I have a black box algorithm and throw many strings into it and observe the strings that come out, will I be able to copy the black box in a simple way?<<

And the answer is "No". There is no general way of doing this.
You could of course brute force it and just throw a huge amount of strings into it and then save the output somewhere. And then do a table lookup.

But for the 2^256 (32 bytes) many entries that table would not be storable on earth. Or in this universe.

Enemy already knows the algorithm
Now regarding encryption: You generally don't try to guess at an algorithm, instead you assume that the enemy knows the algorithm and you try to make him guess at the key.

This is a design principle known as Kerckhoffs's Principle. And it's generally thought to be a good thing.

StackzOfZtuff
  • 17,783
  • 1
  • 50
  • 86
2

This topic or area is commonly known as cryptoanalysis and involves many principles, some of them are commonly known as:

  1. Frequency analysis
  2. Known-plaintext analysis
  3. Chosen-plaintext analysis
  4. Ciphertext-only analysis
  5. Man-in-the-middle attack
  6. Timing/differential power analysis

With that said, and given this is NOT a reversible algorithm implementation; the provided scenario;

Can you explain to me, if you had a super-computer and billions of strings encrypted with that function, how could you find out the algorithm they were encrypted with?

The salt generation is going to be the weak link of the algorithm used, or more specifically the salt method will allow for the weakening of the overall my_hash() function.

In particular the $salt will eventually create the same salt within your billions of hashes. This is where frequency analysis will play a role in weakening the my_hash() function.

Take the following example;

// Broke this out of you my_hash() function to illustrate
function my_salt() {
    return substr(md5(str_shuffle('0123456789abcdef')), 0, 5);
}

$salts = array_map('my_salt', range(1, 10000));
$analysis = array();

function printer($title, $str) {
  echo "<b>" . $title . "</b><pre>";
  print_r($str);
  echo "</pre>";
}

$total = 0;

foreach($salts as $key => $value) {
  if (in_array($value, $analysis)) {
    printer("Salt found:", "At record " . $key . ", " . $value);
    $total++;
  }

  array_push($analysis, $value);
}
printer("Totals: ", "Records: " . count($salts) . "; Collisions: " . $total . " Average: " . (count($salts) / $total));

With only 10,000 records the salt function produced 72 collisions for an average of 138.888888889. This will substantially weaken the overall effectiveness of the my_hash() function.

jas-
  • 931
  • 5
  • 9