Consider you have a hash function \$\mathcal{H}\$ which takes strings of length \$2n\$ and returns strings of length \$n\$ and has the nice property that it is collision resistant, i.e. it is hard to find two different strings \$s \neq s'\$ with the same hash \$\mathcal{H}(s) = \mathcal{H}(s')\$.

You would now like to build a new hash function \$\mathcal{H'}\$ which takes strings of arbitrary length and maps them to strings of length \$n\$, while still being collision resistant.

Lucky for you, already in 1979 a method now known as the Merkle–Damgård construction was published which achieves exactly this.

The task of this challenge will be to implement this algorithm, so we'll first have a look at a formal description of the Merkle–Damgård construction, before going through a step-by-step example which should show that the approach is simpler than it might appear at first.

Given some integer \$n > 0\$, a hash function \$\mathcal{H}\$ as described above and an input string \$s\$ of arbitrary length, the new hash function \$\mathcal{H'}\$ does the following:

Set \$ l = |s|\$, the length of \$s\$, and split \$s\$ in chunks of length \$n\$, filling up the last chunk with trailing zeros if necessary. This yields \$m = \lceil \frac{l}{n} \rceil \$ many chunks which are labeled \$c_1, c_2, \dots, c_m \$.

Add a leading and a trailing chunk \$c_0\$ and \$c_{m+1}\$, where \$c_0\$ is a string consisting of \$n\$ zeros and \$c_{m+1}\$ is \$n\$ in binary, padded with leading zeros to length \$n\$.

Now iteratively apply \$\mathcal{H}\$ to the current chunk \$c_i\$ appended to the previous result \$r_{i-1}\$: \$ r_i = \mathcal{H}(r_{i-1}c_i)\$, where \$r_0 = c_0\$. (This step might be more clear after looking at the example below.)

The output of \$\mathcal{H'}\$ is the final result \$r_{m+1}\$.

The Task

Write a program or function which takes as input a positive integer \$n\$, a hash function \$\mathcal{H}\$ as black box and a non-empty string \$s\$ and returns the same result as \$\mathcal{H'}\$ on the same inputs.

This is code-golf, so the shortest answer in each language wins.

Example

Let's say \$n = 5\$, so our given hash function \$\mathcal{H}\$ takes strings of length 10 and returns strings of length 5.

Given an input of \$s = \texttt{"Programming Puzzles"} \$, we get the following chunks: \$s_1 = \texttt{"Progr"} \$, \$s_2 = \texttt{"ammin"} \$, \$s_3 = \texttt{"g Puz"} \$ and \$s_4 = \texttt{"zles0"} \$. Note that \$s_4\$ needed to be padded to length 5 with one trailing zero.
\$ c_0 = \texttt{"00000"}\$ is just a string of five zeros and \$ c_5 = \texttt{"00101"}\$ is five in binary (\$\texttt{101}\$), padded with two leading zeros.
Now the chunks are combined with \$\mathcal{H}\$:
\$r_0 = c_0 = \texttt{"00000"} \$
\$ r_1 = \mathcal{H}(r_0c_1) = \mathcal{H}(\texttt{"00000Progr"})\$
\$ r_2 = \mathcal{H}(r_1c_2) = \mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\$ \$ r_3 = \mathcal{H}(r_2c_3) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\$
\$ r_4 = \mathcal{H}(r_3c_4) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\texttt{"zles0"})\$
\$ r_5 = \mathcal{H}(r_4c_5) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\texttt{"zles0"})\texttt{"00101"})\$
\$r_5\$ is our output.

Let's have a look how this output would look depending on some choices¹ for \$\mathcal{H}\$:

If \$\mathcal{H}(\texttt{"0123456789"}) = \texttt{"13579"}\$, i.e. \$\mathcal{H}\$ just returns every second character, we get:
\$r_1 = \mathcal{H}(\texttt{"00000Progr"}) = \texttt{"00Por"}\$
\$r_2 = \mathcal{H}(\texttt{"00Porammin"}) = \texttt{"0oamn"}\$
\$r_3 = \mathcal{H}(\texttt{"0oamng Puz"}) = \texttt{"omgPz"}\$
\$r_4 = \mathcal{H}(\texttt{"omgPzzles0"}) = \texttt{"mPze0"}\$
\$r_5 = \mathcal{H}(\texttt{"mPze000101"}) = \texttt{"Pe011"}\$
So \$\texttt{"Pe011"}\$ needs to be the output if such a \$\mathcal{H}\$ is given as black box function.
If \$\mathcal{H}\$ simply returns the first 5 chars of its input, the output of \$\mathcal{H'}\$ is \$\texttt{"00000"}\$. Similarly if \$\mathcal{H}\$ returns the last 5 chars, the output is \$\texttt{"00101"}\$.
If \$\mathcal{H}\$ multiplies the character codes of its input and returns the first five digits of this number, e.g. \$\mathcal{H}(\texttt{"PPCG123456"}) = \texttt{"56613"}\$, then \$\mathcal{H}'(\texttt{"Programming Puzzles"}) = \texttt{"91579"}\$.

^{1 For simplicity, those \$\mathcal{H}\$ are actually not collision resistant, though this does not matter for testing your submission.}

Laikoni

Posted 2018-10-10T16:49:15.173

Reputation: 23 676

Sandbox (deleted) – Laikoni – 2018-10-10T16:53:52.307

I must say it's fun that the example given has the last 'full' hash be of "OMG Puzzles!" effectively omgPzzles0. Well chosen example input! – LambdaBeta – 2018-10-10T21:53:23.950

Can we assume some flexibility on the input format for H (e.g. it takes two strings of length n, or a longer string of which it only considers the first 2n characters)? – Delfad0r – 2018-10-10T23:21:45.490

Are space characters, e.g., between "g P" valid output? – guest271314 – 2018-10-11T00:13:43.280

@guest271314 If the space is part of the resulting hash, it needs to be outputted. If the hash is actually "gP", you may not output a space inbetween. – Laikoni – 2018-10-11T08:53:16.940

@Delfad0r I'd rather say no, because H is a black-box function from which you only know it takes strings of a certain length and returns strings of half the length. So other assumptions like H ignores all but the first n bytes of the string should not be made. – Laikoni – 2018-10-11T08:56:20.100

Answers

Haskell, 91 90 86 bytes

-1 byte thanks to Laikoni
-4 bytes thanks to xnor

n!h|let a='0'<$[1..n];c?""=c;c?z=h(c++take n(z++a))?drop n z=h.(++mapM(:"1")a!!n).(a?)

Try it online!

Explanation

a='0'<$[1..n]

Just assigns the string "00...0" ('0' \$n\$ times) to a

c?""=c
c?z=h(c++take n(z++a))?drop n z

The function ? implements the recursive application of h: c is the hash we have obtained so far (length \$n\$), z is the rest of the string. If z is empty then we simply return c, otherwise we take the first \$n\$ characters of z (possibly padding with zeros from a), prepend c and apply h. This gives the new hash, and then we call ? recursively on this hash and the remaining characters of z.

n!h=h.(++mapM(:"1")a!!n).(a?)

The function ! is the one actually solving the challenge. It takes n, h and s (implicit) as inputs. We compute a?s, and all we have to do is append n in binary and apply h once more. mapM(:"1")a!!n returns the binary representation of \$n\$.

Delfad0r

Posted 2018-10-10T16:49:15.173

Reputation: 1 688

1let in a guard is shorter than using where: Try it online! – Laikoni – 2018-10-10T18:14:48.710

2It looks like mapM(\_->"01")a can be mapM(:"1")a. – xnor – 2018-10-10T23:35:40.087

R, 159 154 bytes

function(n,H,s,`?`=paste0,`*`=strrep,`/`=Reduce,`+`=nchar,S=0*n?s?0*-(+s%%-n)?"?"/n%/%2^(n:1-1)%%2)(function(x,y)H(x?y))/substring(S,s<-seq(,+S,n),s--n-1)

Try it online!

Yuck! Answering string challenges in R is never pretty, but this is horrible. This is an instructive answer on how not to write "normal" R code...

Thanks to nwellnhof for fixing a bug, at a cost of 0 bytes!

Thanks to J.Doe for swapping the operator aliasing to change the precedence, good for -4 bytes.

The explanation below is for the previous version of the code, but the principles remain the same.

function(n,H,s,               # harmless-looking function arguments with horrible default arguments 
                              # to prevent the use of {} and save two bytes
                              # then come the default arguments,
                              # replacing operators as aliases for commonly used functions:
 `+`=paste0,                  # paste0 with binary +
 `*`=strrep,                  # strrep for binary *
 `/`=Reduce,                  # Reduce with binary /
 `?`=nchar,                   # nchar with unary ?
 S=                           # final default argument S, the padded string:
  0*n+                        # rep 0 n times
  s+                          # the original string
  0*-((?s)%%-n)+              # 0 padding as a multiple of n
  "+"/n%/%2^(n:1-1)%%2)       # n as an n-bit number
                              # finally, the function body:
 (function(x,y)H(x+y)) /      # Reduce/Fold (/) by H operating on x + y
  substring(S,seq(1,?S,n),seq(n,?S,n))  # operating on the n-length substrings of S

Giuseppe

Posted 2018-10-10T16:49:15.173

Reputation: 21 077

I think 0*(n-(?s)%%n) doesn't work if n divides s evenly. But 0*-((?s)%%-n) should work. – nwellnhof – 2018-10-14T02:07:16.290

@nwellnhof ah, of course, thank you, fixed. – Giuseppe – 2018-10-15T13:52:04.783

Minor changes, 155 bytes

– J.Doe – 2018-10-16T10:19:04.760

1@J.Doe nice! I saved another byte since seq has 1 as its from argument by default. – Giuseppe – 2018-10-16T13:29:56.850

C (gcc), 251 bytes

#define P sprintf(R,
b(_){_=_>1?10*b(_/2)+_%2:_;}f(H,n,x)void(*H)(char*);char*x;{char R[2*n+1],c[n+1],*X=x;P"%0*d",n,0);while(strlen(x)>n){strncpy(c,x,n);x+=n;strcat(R,c);H(R);}P"%s%s%0*d",R,x,n-strlen(x),0);H(R);P"%s%0*d",R,n,b(n));H(R);strcpy(X,R);}

Try it online!

Not as clean as the bash solution, and highly improvable.

The function is f taking H as a function that replaces its string input with that string's hash, n as in the description, and x the input string and output buffer.

Description:

#define P sprintf(R,     // Replace P with sprintf(R, leading to unbalanced parenthesis
                         // This is replaced and expanded for the rest of the description
b(_){                    // Define b(x). It will return the integer binary expansion of _
                         // e.g. 5 -> 101 (still as integer)
  _=_>1?                 // If _ is greater than 1
    10*b(_/2)+_%2        // return 10*binary expansion of _/2 + last binary digit
    :_;}                 // otherwise just _
f(H,n,x)                 // Define f(H,n,x)
  void(*H)(char*);       // H is a function taking a string
  char*x; {              // x is a string
  char R[2*n+1],c[n+1],  // Declare R as a 2n-length string and c as a n-length string
  *X=x;                  // save x so we can overwrite it later
  sprintf(R,"%0*d",n,0); // print 'n' 0's into R
  while(strlen(x)>n){    // while x has at least n characters
    strncpy(c,x,n);x+=n; // 'move' the first n characters of x into c
    strcat(R,c);         // concatenate c and R
    H(R);}               // Hash R
  sprintf(R,"%s%s%0*d"   // set R to two strings concatenated followed by some zeroes
    R,x,                 // the two strings being R and (what's left of) x
    n-strlen(x),0);      // and n-len(x) zeroes
  H(R);                  // Hash R
  sprintf(R,"%s%*d",R,n, // append to R the decimal number, 0 padded to width n
    b(n));               // The binary expansion of n as a decimal number
  H(R);strcpy(X,R);}     // Hash R and copy it into where x used to be

LambdaBeta

Posted 2018-10-10T16:49:15.173

Reputation: 2 499

229 bytes – ceilingcat – 2018-10-30T20:07:41.097

I think: 227 bytes (going off of ceilingcat's comment)

– Zacharý – 2018-11-16T17:14:46.903

Ruby, 78 bytes

->n,s,g{(([?0*n]*2*s).chop.scan(/.{#{n}}/)+["%0#{n}b"%n]).reduce{|s,x|g[s+x]}}

Try it online!

How it works:

([?0*n]*2*s).chop    # Padding: add leading and trailing 
                     # zeros, then remove the last one
.scan(/.{#{n}}/)     # Split the string into chunks
                     # of length n
+["%0#{n}b"%n]       # Add the trailing block
.reduce{|s,x|g[s+x]} # Apply the hashing function
                     # repeatedly

G B

Posted 2018-10-10T16:49:15.173

Reputation: 11 099

Jelly, 23 bytes

0Ṿ;s;BṾ€ṚWƲ}z”0ZU0¦;Ç¥/

Try it online!

Accepts \$\mathcal H\$ at the line above it, \$s\$ as its left argument, and \$n\$ as its right argument.

Erik the Outgolfer

Posted 2018-10-10T16:49:15.173

Reputation: 38 134

Bash, 127-ε bytes

Z=`printf %0*d $1` R=$Z
while IFS= read -rn$1 c;do R=$R$c$Z;R=`H<<<${R::2*$1}`;done
H< <(printf $R%0*d $1 `bc <<<"obase=2;$1"`)

Try it online!

This works as a program/function/script/snippet. H must be resolveable to a program or function that will perform the hashing. N is the argument. Example call:

$ H() {
>   sed 's/.\(.\)/\1/g'
> }
$ ./wherever_you_put_the_script.sh 5 <<< "Programming Puzzles"  # if you add a shebang
Pe011

Description:

Z=`printf %0*d $1`

This creates a string of $1 zeroes. This works by calling printf and telling it to print an integer padded to extra argument width. That extra argument we pass is $1, the argument to the program/function/script which stores n.

R=$Z

This merely copies Z, our zero string, to R, our result string, in preparation for the hashing loop.

while IFS= read -rn$1 c; do

This loops over the input every $1 (n) characters loading the read characters into c. If the input ends then c merely ends up too short. The r option ensures that any special characters in the input don't get bash-interpreted. This is the -ε in the title - that r isn't strictly necessary, but makes the function more accurately match the input.

R=$R$c$Z

This concatenates the n characters read from input to R along with zeroes for padding (too many zeroes for now).

R=`H<<<${R::2*$1}`;done

This uses a here string as input to the hash function. The contents ${R::2*$1} are a somewhat esoteric bash parameter substitution which reads: R, starting from 0, only 2n characters.

Here the loop ends and we finish with:

H< <(printf $R%0*d $1 `bc <<<"obase=2;$1"`)

Here the same format string trick is used to 0 pad the number. bc is used to convert it to binary by setting the output base (obase) to 2. The result is passed to the hash function/program whose output is not captured and thus is shown to the user.

LambdaBeta

Posted 2018-10-10T16:49:15.173

Reputation: 2 499

Why "127-ε"? Why not just "127"? – Solomon Ucko – 2018-10-11T01:02:21.397

I don't know. I was on the fence about the necessity of the r flag. I figured 1 byte doesn't really matter, but if pushed I could shave it. – LambdaBeta – 2018-10-11T01:03:39.050

For the read command? – Solomon Ucko – 2018-10-11T01:12:16.733

Because without it a \ in the input will be interpreted instead of ignored, so they'd have to be escaped. – LambdaBeta – 2018-10-11T01:15:24.020

Maybe add a note about that? – Solomon Ucko – 2018-10-11T01:17:08.423

I did, in the description for the read command. – LambdaBeta – 2018-10-11T01:17:58.763

It's just not obvious from the title or first few lines. – Solomon Ucko – 2018-10-11T01:19:01.203

Perl 6, 79 68 bytes

{reduce &^h o&[~],comb 0 x$^n~$^s~$n.fmt("%.{$n-$s.comb%-$n}b"): $n}

Try it online!

Explanation

{
  reduce         # Reduce with
    &^h o&[~],   # composition of string concat and hash function
    comb         # Split string
      0 x$^n     # Zero repeated n times
      ~$^s       # Append input string s
      ~$n.fmt("  # Append n formatted
        %.       # with leading zeroes,
        {$n             # field width n for final chunk
         -$s.comb%-$n}  # -(len(s)%-n) for padding,
        b")      # as binary number
      :          # Method call with colon syntax
      $n         # Split into substrings of length n
}

nwellnhof

Posted 2018-10-10T16:49:15.173

Reputation: 10 037

Pyth, 24 bytes

Since Pyth doesn't allow H to be used for a function name, I use y instead.

uy+GH+c.[E=`ZQQ.[ZQ.BQ*Z

Try it online! Example is with the "every second character" version of H.

Steven H.

Posted 2018-10-10T16:49:15.173

Reputation: 2 841

Clean, 143 bytes

import StdEnv
r=['0':r]
$n h s=foldl(\a b=h(a++b))(r%(1,n))([(s++r)%(i,i+n-1)\\i<-[0,n..length s]]++[['0'+toChar((n>>(n-p))rem 2)\\p<-[1..n]]])

Try it online!

Οurous

Posted 2018-10-10T16:49:15.173

Reputation: 7 916

Python 2, 126 113 bytes

lambda n,H,s:reduce(lambda x,y:H(x+y),re.findall('.'*n,'0'*n+s+'0'*(n-len(s)%n))+[bin(n)[2:].zfill(n)])
import re

Try it online!

-13 thanks to Triggernometry.

Yeah, this is an abomination, why can't I just use a built-in to split a string into chunks...? :-(

Erik the Outgolfer

Posted 2018-10-10T16:49:15.173

Reputation: 38 134

https://codegolf.stackexchange.com/a/173952/55696 A while loop is the best builtin I could hope for. 104 bytes – Steven H. – 2018-10-12T23:51:30.303

@StevenH. Yeah, especially if you're actually focusing on the golfing itself. >_> – Erik the Outgolfer – 2018-10-13T11:31:06.310

'0'*~-n instead of '0'*(len(s)%n) is shorter (and actually correct for shorter inputs). – nwellnhof – 2018-10-14T02:01:37.460

@nwellnhof Yeah, but it's definitely not the same thing. – Erik the Outgolfer – 2018-10-14T08:40:19.313

Maybe I wasn't clear enough. Your solution gives the wrong answer for strings like Programming Puzz (16 chars). Replacing '0'*(len(s)%n) with '0'*~-n fixes that and saves 7 bytes. – nwellnhof – 2018-10-14T09:37:24.010

@nwellnhof Ah, yep, applied a little patch over there, I'll have to test your suggestion a bit though. – Erik the Outgolfer – 2018-10-14T09:45:58.957

Python 2, 106 102 bytes

For once, the function outgolfs the lambda. -4 bytes for simple syntax manipulation, thanks to Jo King.

def f(n,H,s):
 x='0'*n;s+='0'*(n-len(s)%n)+bin(n)[2:].zfill(n)
 while s:x=H(x+s[:n]);s=s[n:]
 return x

Try it online!

Steven H.

Posted 2018-10-10T16:49:15.173

Reputation: 2 841

Shouldn't the result be 'Pe011', not 'e011'? – Triggernometry – 2018-10-16T16:06:46.430

That it should. Fixed! – Steven H. – 2018-10-29T22:20:52.003

Use semi-colons instead of newlines. -4 bytes

– Jo King – 2018-10-29T22:23:39.460

I didn't realize that worked for while loops as well, thanks! – Steven H. – 2018-10-29T22:25:50.377

Japt, 27 bytes

òV ú'0 pV¤ùTV)rÈ+Y gOvW}VçT

Try it!

I haven't found any capability for Japt to take functions directly as an input, so this takes a string which is interpreted as Japt code and expects it to define a function. Specifically, OvW takes the third input and interprets it as Japt, then g calls it. Replacing that with OxW allows input as a Javascript function instead, or if the function were (somehow) already stored in W it could just be W and save 2 bytes. The link above has the worked example of \$\mathcal{H}\$ that takes characters at odd indexes, while this one is the "multiply char-codes and take the 5 highest digits" example.

Due to the way Japt takes inputs, \$s\$ will be U, \$n\$ will be V, and \$\mathcal{H}\$ will be W

Explanation:

òV                             Split U into segments of length V
   ú'0                         Right-pad the short segment with "0" to the same length as the others
       p     )                 Add an extra element:
        V¤                       V as a base-2 string
          ùTV                    Left-pad with "0" until it is V digits long
              r                Reduce...
                        VçT          ...Starting with "0" repeated V times...
               È       }                                                  ...By applying:
                +Y               Combine with the previous result
                   gOvW          And run W as Japt code

Kamil Drakari

Posted 2018-10-10T16:49:15.173

Reputation: 3 461

GolfScript, 47 bytes

~:π'0'*:s\+s+π/);[sπ2base{48+}%+0π->]+{+1$~}*\;

Try it online!

wastl

Posted 2018-10-10T16:49:15.173

Reputation: 3 089

oK, 41 bytes

{(x#48)(y@,)/(0N,x)#z,,/$((x+x!-#z)#2)\x}

Try it online!

{                                       } /x is n, y is H, z is s.
                          (x+x!-#z)       /number of padding 0's needed + x
                         (         #2)\x  /binary(x) with this length
                      ,/$                 /to string
                    z,                    /append to z
             (0N,x)#                      /split into groups of length x
       (y@,)/                             /foldl of y(concat(left, right))...
 (x#48)                                   /...with "0"*x as the first left string

zgrep

Posted 2018-10-10T16:49:15.173

Reputation: 1 291

Arbitrary Length Hashing

The Task

Example

Answers

Haskell, 91 90 86 bytes

Explanation

R, 159 154 bytes

C (gcc), 251 bytes

Ruby, 78 bytes

How it works:

Jelly, 23 bytes

Bash, 127-ε bytes

Perl 6, 79 68 bytes

Explanation

Pyth, 24 bytes

Clean, 143 bytes

Python 2, 126 113 bytes

Python 2, 106 102 bytes

Japt, 27 bytes

GolfScript, 47 bytes

oK, 41 bytes