16
3
Consider you have a hash function \$\mathcal{H}\$ which takes strings of length \$2n\$ and returns strings of length \$n\$ and has the nice property that it is collision resistant, i.e. it is hard to find two different strings \$s \neq s'\$ with the same hash \$\mathcal{H}(s) = \mathcal{H}(s')\$.
You would now like to build a new hash function \$\mathcal{H'}\$ which takes strings of arbitrary length and maps them to strings of length \$n\$, while still being collision resistant.
Lucky for you, already in 1979 a method now known as the Merkle–Damgård construction was published which achieves exactly this.
The task of this challenge will be to implement this algorithm, so we'll first have a look at a formal description of the Merkle–Damgård construction, before going through a step-by-step example which should show that the approach is simpler than it might appear at first.
Given some integer \$n > 0\$, a hash function \$\mathcal{H}\$ as described above and an input string \$s\$ of arbitrary length, the new hash function \$\mathcal{H'}\$ does the following:
- Set \$ l = |s|\$, the length of \$s\$, and split \$s\$ in chunks of length \$n\$, filling up the last chunk with trailing zeros if necessary. This yields \$m = \lceil \frac{l}{n} \rceil \$ many chunks which are labeled \$c_1, c_2, \dots, c_m \$.
- Add a leading and a trailing chunk \$c_0\$ and \$c_{m+1}\$, where \$c_0\$ is a string consisting of \$n\$ zeros and \$c_{m+1}\$ is \$n\$ in binary, padded with leading zeros to length \$n\$.
- Now iteratively apply \$\mathcal{H}\$ to the current chunk \$c_i\$ appended to the previous result \$r_{i-1}\$: \$ r_i = \mathcal{H}(r_{i-1}c_i)\$, where \$r_0 = c_0\$. (This step might be more clear after looking at the example below.)
- The output of \$\mathcal{H'}\$ is the final result \$r_{m+1}\$.
The Task
Write a program or function which takes as input a positive integer \$n\$, a hash function \$\mathcal{H}\$ as black box and a non-empty string \$s\$ and returns the same result as \$\mathcal{H'}\$ on the same inputs.
This is code-golf, so the shortest answer in each language wins.
Example
Let's say \$n = 5\$, so our given hash function \$\mathcal{H}\$ takes strings of length 10 and returns strings of length 5.
- Given an input of \$s = \texttt{"Programming Puzzles"} \$, we get the following chunks: \$s_1 = \texttt{"Progr"} \$, \$s_2 = \texttt{"ammin"} \$, \$s_3 = \texttt{"g Puz"} \$ and \$s_4 = \texttt{"zles0"} \$. Note that \$s_4\$ needed to be padded to length 5 with one trailing zero.
- \$ c_0 = \texttt{"00000"}\$ is just a string of five zeros and \$ c_5 = \texttt{"00101"}\$ is five in binary (\$\texttt{101}\$), padded with two leading zeros.
- Now the chunks are combined with \$\mathcal{H}\$:
\$r_0 = c_0 = \texttt{"00000"} \$
\$ r_1 = \mathcal{H}(r_0c_1) = \mathcal{H}(\texttt{"00000Progr"})\$
\$ r_2 = \mathcal{H}(r_1c_2) = \mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\$ \$ r_3 = \mathcal{H}(r_2c_3) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\$
\$ r_4 = \mathcal{H}(r_3c_4) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\texttt{"zles0"})\$
\$ r_5 = \mathcal{H}(r_4c_5) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\texttt{"zles0"})\texttt{"00101"})\$ - \$r_5\$ is our output.
Let's have a look how this output would look depending on some choices1 for \$\mathcal{H}\$:
- If \$\mathcal{H}(\texttt{"0123456789"}) = \texttt{"13579"}\$, i.e. \$\mathcal{H}\$ just returns every second character, we get:
\$r_1 = \mathcal{H}(\texttt{"00000Progr"}) = \texttt{"00Por"}\$
\$r_2 = \mathcal{H}(\texttt{"00Porammin"}) = \texttt{"0oamn"}\$
\$r_3 = \mathcal{H}(\texttt{"0oamng Puz"}) = \texttt{"omgPz"}\$
\$r_4 = \mathcal{H}(\texttt{"omgPzzles0"}) = \texttt{"mPze0"}\$
\$r_5 = \mathcal{H}(\texttt{"mPze000101"}) = \texttt{"Pe011"}\$
So \$\texttt{"Pe011"}\$ needs to be the output if such a \$\mathcal{H}\$ is given as black box function. - If \$\mathcal{H}\$ simply returns the first 5 chars of its input, the output of \$\mathcal{H'}\$ is \$\texttt{"00000"}\$. Similarly if \$\mathcal{H}\$ returns the last 5 chars, the output is \$\texttt{"00101"}\$.
- If \$\mathcal{H}\$ multiplies the character codes of its input and returns the first five digits of this number, e.g. \$\mathcal{H}(\texttt{"PPCG123456"}) = \texttt{"56613"}\$, then \$\mathcal{H}'(\texttt{"Programming Puzzles"}) = \texttt{"91579"}\$.
1 For simplicity, those \$\mathcal{H}\$ are actually not collision resistant, though this does not matter for testing your submission.
Sandbox (deleted) – Laikoni – 2018-10-10T16:53:52.307
I must say it's fun that the example given has the last 'full' hash be of "OMG Puzzles!" effectively
omgPzzles0
. Well chosen example input! – LambdaBeta – 2018-10-10T21:53:23.950Can we assume some flexibility on the input format for H (e.g. it takes two strings of length n, or a longer string of which it only considers the first 2n characters)? – Delfad0r – 2018-10-10T23:21:45.490
Are space characters, e.g., between "g P" valid output? – guest271314 – 2018-10-11T00:13:43.280
@guest271314 If the space is part of the resulting hash, it needs to be outputted. If the hash is actually "gP", you may not output a space inbetween. – Laikoni – 2018-10-11T08:53:16.940
@Delfad0r I'd rather say no, because H is a black-box function from which you only know it takes strings of a certain length and returns strings of half the length. So other assumptions like H ignores all but the first n bytes of the string should not be made. – Laikoni – 2018-10-11T08:56:20.100