Arbitrary Length Hashing

16

3

Consider you have a hash function \$\mathcal{H}\$ which takes strings of length \$2n\$ and returns strings of length \$n\$ and has the nice property that it is collision resistant, i.e. it is hard to find two different strings \$s \neq s'\$ with the same hash \$\mathcal{H}(s) = \mathcal{H}(s')\$.

You would now like to build a new hash function \$\mathcal{H'}\$ which takes strings of arbitrary length and maps them to strings of length \$n\$, while still being collision resistant.

Lucky for you, already in 1979 a method now known as the Merkle–Damgård construction was published which achieves exactly this.

The task of this challenge will be to implement this algorithm, so we'll first have a look at a formal description of the Merkle–Damgård construction, before going through a step-by-step example which should show that the approach is simpler than it might appear at first.

Given some integer \$n > 0\$, a hash function \$\mathcal{H}\$ as described above and an input string \$s\$ of arbitrary length, the new hash function \$\mathcal{H'}\$ does the following:

  • Set \$ l = |s|\$, the length of \$s\$, and split \$s\$ in chunks of length \$n\$, filling up the last chunk with trailing zeros if necessary. This yields \$m = \lceil \frac{l}{n} \rceil \$ many chunks which are labeled \$c_1, c_2, \dots, c_m \$.
  • Add a leading and a trailing chunk \$c_0\$ and \$c_{m+1}\$, where \$c_0\$ is a string consisting of \$n\$ zeros and \$c_{m+1}\$ is \$n\$ in binary, padded with leading zeros to length \$n\$.
  • Now iteratively apply \$\mathcal{H}\$ to the current chunk \$c_i\$ appended to the previous result \$r_{i-1}\$: \$ r_i = \mathcal{H}(r_{i-1}c_i)\$, where \$r_0 = c_0\$. (This step might be more clear after looking at the example below.)
  • The output of \$\mathcal{H'}\$ is the final result \$r_{m+1}\$.

The Task

Write a program or function which takes as input a positive integer \$n\$, a hash function \$\mathcal{H}\$ as black box and a non-empty string \$s\$ and returns the same result as \$\mathcal{H'}\$ on the same inputs.

This is , so the shortest answer in each language wins.

Example

Let's say \$n = 5\$, so our given hash function \$\mathcal{H}\$ takes strings of length 10 and returns strings of length 5.

  • Given an input of \$s = \texttt{"Programming Puzzles"} \$, we get the following chunks: \$s_1 = \texttt{"Progr"} \$, \$s_2 = \texttt{"ammin"} \$, \$s_3 = \texttt{"g Puz"} \$ and \$s_4 = \texttt{"zles0"} \$. Note that \$s_4\$ needed to be padded to length 5 with one trailing zero.
  • \$ c_0 = \texttt{"00000"}\$ is just a string of five zeros and \$ c_5 = \texttt{"00101"}\$ is five in binary (\$\texttt{101}\$), padded with two leading zeros.
  • Now the chunks are combined with \$\mathcal{H}\$:
    \$r_0 = c_0 = \texttt{"00000"} \$
    \$ r_1 = \mathcal{H}(r_0c_1) = \mathcal{H}(\texttt{"00000Progr"})\$
    \$ r_2 = \mathcal{H}(r_1c_2) = \mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\$ \$ r_3 = \mathcal{H}(r_2c_3) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\$
    \$ r_4 = \mathcal{H}(r_3c_4) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\texttt{"zles0"})\$
    \$ r_5 = \mathcal{H}(r_4c_5) = \mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\mathcal{H}(\texttt{"00000Progr"})\texttt{"ammin"})\texttt{"g Puz"})\texttt{"zles0"})\texttt{"00101"})\$
  • \$r_5\$ is our output.

Let's have a look how this output would look depending on some choices1 for \$\mathcal{H}\$:

  • If \$\mathcal{H}(\texttt{"0123456789"}) = \texttt{"13579"}\$, i.e. \$\mathcal{H}\$ just returns every second character, we get:
    \$r_1 = \mathcal{H}(\texttt{"00000Progr"}) = \texttt{"00Por"}\$
    \$r_2 = \mathcal{H}(\texttt{"00Porammin"}) = \texttt{"0oamn"}\$
    \$r_3 = \mathcal{H}(\texttt{"0oamng Puz"}) = \texttt{"omgPz"}\$
    \$r_4 = \mathcal{H}(\texttt{"omgPzzles0"}) = \texttt{"mPze0"}\$
    \$r_5 = \mathcal{H}(\texttt{"mPze000101"}) = \texttt{"Pe011"}\$
    So \$\texttt{"Pe011"}\$ needs to be the output if such a \$\mathcal{H}\$ is given as black box function.
  • If \$\mathcal{H}\$ simply returns the first 5 chars of its input, the output of \$\mathcal{H'}\$ is \$\texttt{"00000"}\$. Similarly if \$\mathcal{H}\$ returns the last 5 chars, the output is \$\texttt{"00101"}\$.
  • If \$\mathcal{H}\$ multiplies the character codes of its input and returns the first five digits of this number, e.g. \$\mathcal{H}(\texttt{"PPCG123456"}) = \texttt{"56613"}\$, then \$\mathcal{H}'(\texttt{"Programming Puzzles"}) = \texttt{"91579"}\$.

1 For simplicity, those \$\mathcal{H}\$ are actually not collision resistant, though this does not matter for testing your submission.

Laikoni

Posted 2018-10-10T16:49:15.173

Reputation: 23 676

Sandbox (deleted) – Laikoni – 2018-10-10T16:53:52.307

I must say it's fun that the example given has the last 'full' hash be of "OMG Puzzles!" effectively omgPzzles0. Well chosen example input! – LambdaBeta – 2018-10-10T21:53:23.950

Can we assume some flexibility on the input format for H (e.g. it takes two strings of length n, or a longer string of which it only considers the first 2n characters)? – Delfad0r – 2018-10-10T23:21:45.490

Are space characters, e.g., between "g P" valid output? – guest271314 – 2018-10-11T00:13:43.280

@guest271314 If the space is part of the resulting hash, it needs to be outputted. If the hash is actually "gP", you may not output a space inbetween. – Laikoni – 2018-10-11T08:53:16.940

@Delfad0r I'd rather say no, because H is a black-box function from which you only know it takes strings of a certain length and returns strings of half the length. So other assumptions like H ignores all but the first n bytes of the string should not be made. – Laikoni – 2018-10-11T08:56:20.100

Answers

7

Haskell, 91 90 86 bytes

n!h|let a='0'<$[1..n];c?""=c;c?z=h(c++take n(z++a))?drop n z=h.(++mapM(:"1")a!!n).(a?)

Try it online!

Explanation

a='0'<$[1..n]

Just assigns the string "00...0" ('0' \$n\$ times) to a


c?""=c
c?z=h(c++take n(z++a))?drop n z

The function ? implements the recursive application of h: c is the hash we have obtained so far (length \$n\$), z is the rest of the string. If z is empty then we simply return c, otherwise we take the first \$n\$ characters of z (possibly padding with zeros from a), prepend c and apply h. This gives the new hash, and then we call ? recursively on this hash and the remaining characters of z.


n!h=h.(++mapM(:"1")a!!n).(a?)

The function ! is the one actually solving the challenge. It takes n, h and s (implicit) as inputs. We compute a?s, and all we have to do is append n in binary and apply h once more. mapM(:"1")a!!n returns the binary representation of \$n\$.

Delfad0r

Posted 2018-10-10T16:49:15.173

Reputation: 1 688

1let in a guard is shorter than using where: Try it online! – Laikoni – 2018-10-10T18:14:48.710

2It looks like mapM(\_->"01")a can be mapM(:"1")a. – xnor – 2018-10-10T23:35:40.087

7

R, 159 154 bytes

function(n,H,s,`?`=paste0,`*`=strrep,`/`=Reduce,`+`=nchar,S=0*n?s?0*-(+s%%-n)?"?"/n%/%2^(n:1-1)%%2)(function(x,y)H(x?y))/substring(S,s<-seq(,+S,n),s--n-1)

Try it online!

Yuck! Answering challenges in R is never pretty, but this is horrible. This is an instructive answer on how not to write "normal" R code...

Thanks to nwellnhof for fixing a bug, at a cost of 0 bytes!

Thanks to J.Doe for swapping the operator aliasing to change the precedence, good for -4 bytes.

The explanation below is for the previous version of the code, but the principles remain the same.

function(n,H,s,               # harmless-looking function arguments with horrible default arguments 
                              # to prevent the use of {} and save two bytes
                              # then come the default arguments,
                              # replacing operators as aliases for commonly used functions:
 `+`=paste0,                  # paste0 with binary +
 `*`=strrep,                  # strrep for binary *
 `/`=Reduce,                  # Reduce with binary /
 `?`=nchar,                   # nchar with unary ?
 S=                           # final default argument S, the padded string:
  0*n+                        # rep 0 n times
  s+                          # the original string
  0*-((?s)%%-n)+              # 0 padding as a multiple of n
  "+"/n%/%2^(n:1-1)%%2)       # n as an n-bit number
                              # finally, the function body:
 (function(x,y)H(x+y)) /      # Reduce/Fold (/) by H operating on x + y
  substring(S,seq(1,?S,n),seq(n,?S,n))  # operating on the n-length substrings of S

Giuseppe

Posted 2018-10-10T16:49:15.173

Reputation: 21 077

I think 0*(n-(?s)%%n) doesn't work if n divides s evenly. But 0*-((?s)%%-n) should work. – nwellnhof – 2018-10-14T02:07:16.290

@nwellnhof ah, of course, thank you, fixed. – Giuseppe – 2018-10-15T13:52:04.783

Minor changes, 155 bytes

– J.Doe – 2018-10-16T10:19:04.760

1@J.Doe nice! I saved another byte since seq has 1 as its from argument by default. – Giuseppe – 2018-10-16T13:29:56.850

3

C (gcc), 251 bytes

#define P sprintf(R,
b(_){_=_>1?10*b(_/2)+_%2:_;}f(H,n,x)void(*H)(char*);char*x;{char R[2*n+1],c[n+1],*X=x;P"%0*d",n,0);while(strlen(x)>n){strncpy(c,x,n);x+=n;strcat(R,c);H(R);}P"%s%s%0*d",R,x,n-strlen(x),0);H(R);P"%s%0*d",R,n,b(n));H(R);strcpy(X,R);}

Try it online!

Not as clean as the bash solution, and highly improvable.

The function is f taking H as a function that replaces its string input with that string's hash, n as in the description, and x the input string and output buffer.

Description:

#define P sprintf(R,     // Replace P with sprintf(R, leading to unbalanced parenthesis
                         // This is replaced and expanded for the rest of the description
b(_){                    // Define b(x). It will return the integer binary expansion of _
                         // e.g. 5 -> 101 (still as integer)
  _=_>1?                 // If _ is greater than 1
    10*b(_/2)+_%2        // return 10*binary expansion of _/2 + last binary digit
    :_;}                 // otherwise just _
f(H,n,x)                 // Define f(H,n,x)
  void(*H)(char*);       // H is a function taking a string
  char*x; {              // x is a string
  char R[2*n+1],c[n+1],  // Declare R as a 2n-length string and c as a n-length string
  *X=x;                  // save x so we can overwrite it later
  sprintf(R,"%0*d",n,0); // print 'n' 0's into R
  while(strlen(x)>n){    // while x has at least n characters
    strncpy(c,x,n);x+=n; // 'move' the first n characters of x into c
    strcat(R,c);         // concatenate c and R
    H(R);}               // Hash R
  sprintf(R,"%s%s%0*d"   // set R to two strings concatenated followed by some zeroes
    R,x,                 // the two strings being R and (what's left of) x
    n-strlen(x),0);      // and n-len(x) zeroes
  H(R);                  // Hash R
  sprintf(R,"%s%*d",R,n, // append to R the decimal number, 0 padded to width n
    b(n));               // The binary expansion of n as a decimal number
  H(R);strcpy(X,R);}     // Hash R and copy it into where x used to be

LambdaBeta

Posted 2018-10-10T16:49:15.173

Reputation: 2 499

229 bytes – ceilingcat – 2018-10-30T20:07:41.097

I think: 227 bytes (going off of ceilingcat's comment)

– Zacharý – 2018-11-16T17:14:46.903

3

Ruby, 78 bytes

->n,s,g{(([?0*n]*2*s).chop.scan(/.{#{n}}/)+["%0#{n}b"%n]).reduce{|s,x|g[s+x]}}

Try it online!

How it works:

([?0*n]*2*s).chop    # Padding: add leading and trailing 
                     # zeros, then remove the last one
.scan(/.{#{n}}/)     # Split the string into chunks
                     # of length n
+["%0#{n}b"%n]       # Add the trailing block
.reduce{|s,x|g[s+x]} # Apply the hashing function
                     # repeatedly

G B

Posted 2018-10-10T16:49:15.173

Reputation: 11 099

2

Jelly, 23 bytes

0Ṿ;s;BṾ€ṚWƲ}z”0ZU0¦;Ç¥/

Try it online!

Accepts \$\mathcal H\$ at the line above it, \$s\$ as its left argument, and \$n\$ as its right argument.

Erik the Outgolfer

Posted 2018-10-10T16:49:15.173

Reputation: 38 134

2

Bash, 127-ε bytes

Z=`printf %0*d $1` R=$Z
while IFS= read -rn$1 c;do R=$R$c$Z;R=`H<<<${R::2*$1}`;done
H< <(printf $R%0*d $1 `bc <<<"obase=2;$1"`)

Try it online!

This works as a program/function/script/snippet. H must be resolveable to a program or function that will perform the hashing. N is the argument. Example call:

$ H() {
>   sed 's/.\(.\)/\1/g'
> }
$ ./wherever_you_put_the_script.sh 5 <<< "Programming Puzzles"  # if you add a shebang
Pe011

Description:

Z=`printf %0*d $1`

This creates a string of $1 zeroes. This works by calling printf and telling it to print an integer padded to extra argument width. That extra argument we pass is $1, the argument to the program/function/script which stores n.

R=$Z

This merely copies Z, our zero string, to R, our result string, in preparation for the hashing loop.

while IFS= read -rn$1 c; do

This loops over the input every $1 (n) characters loading the read characters into c. If the input ends then c merely ends up too short. The r option ensures that any special characters in the input don't get bash-interpreted. This is the in the title - that r isn't strictly necessary, but makes the function more accurately match the input.

R=$R$c$Z

This concatenates the n characters read from input to R along with zeroes for padding (too many zeroes for now).

R=`H<<<${R::2*$1}`;done

This uses a here string as input to the hash function. The contents ${R::2*$1} are a somewhat esoteric bash parameter substitution which reads: R, starting from 0, only 2n characters.

Here the loop ends and we finish with:

H< <(printf $R%0*d $1 `bc <<<"obase=2;$1"`)

Here the same format string trick is used to 0 pad the number. bc is used to convert it to binary by setting the output base (obase) to 2. The result is passed to the hash function/program whose output is not captured and thus is shown to the user.

LambdaBeta

Posted 2018-10-10T16:49:15.173

Reputation: 2 499

Why "127-ε"? Why not just "127"? – Solomon Ucko – 2018-10-11T01:02:21.397

I don't know. I was on the fence about the necessity of the r flag. I figured 1 byte doesn't really matter, but if pushed I could shave it. – LambdaBeta – 2018-10-11T01:03:39.050

For the read command? – Solomon Ucko – 2018-10-11T01:12:16.733

Because without it a \ in the input will be interpreted instead of ignored, so they'd have to be escaped. – LambdaBeta – 2018-10-11T01:15:24.020

Maybe add a note about that? – Solomon Ucko – 2018-10-11T01:17:08.423

I did, in the description for the read command. – LambdaBeta – 2018-10-11T01:17:58.763

It's just not obvious from the title or first few lines. – Solomon Ucko – 2018-10-11T01:19:01.203

2

Perl 6, 79 68 bytes

{reduce &^h o&[~],comb 0 x$^n~$^s~$n.fmt("%.{$n-$s.comb%-$n}b"): $n}

Try it online!

Explanation

{
  reduce         # Reduce with
    &^h o&[~],   # composition of string concat and hash function
    comb         # Split string
      0 x$^n     # Zero repeated n times
      ~$^s       # Append input string s
      ~$n.fmt("  # Append n formatted
        %.       # with leading zeroes,
        {$n             # field width n for final chunk
         -$s.comb%-$n}  # -(len(s)%-n) for padding,
        b")      # as binary number
      :          # Method call with colon syntax
      $n         # Split into substrings of length n
}

nwellnhof

Posted 2018-10-10T16:49:15.173

Reputation: 10 037

2

Pyth, 24 bytes

Since Pyth doesn't allow H to be used for a function name, I use y instead.

uy+GH+c.[E=`ZQQ.[ZQ.BQ*Z

Try it online! Example is with the "every second character" version of H.

Steven H.

Posted 2018-10-10T16:49:15.173

Reputation: 2 841

1

Clean, 143 bytes

import StdEnv
r=['0':r]
$n h s=foldl(\a b=h(a++b))(r%(1,n))([(s++r)%(i,i+n-1)\\i<-[0,n..length s]]++[['0'+toChar((n>>(n-p))rem 2)\\p<-[1..n]]])

Try it online!

Οurous

Posted 2018-10-10T16:49:15.173

Reputation: 7 916

1

Python 2, 126 113 bytes

lambda n,H,s:reduce(lambda x,y:H(x+y),re.findall('.'*n,'0'*n+s+'0'*(n-len(s)%n))+[bin(n)[2:].zfill(n)])
import re

Try it online!

-13 thanks to Triggernometry.

Yeah, this is an abomination, why can't I just use a built-in to split a string into chunks...? :-(

Erik the Outgolfer

Posted 2018-10-10T16:49:15.173

Reputation: 38 134

https://codegolf.stackexchange.com/a/173952/55696 A while loop is the best builtin I could hope for. 104 bytes – Steven H. – 2018-10-12T23:51:30.303

@StevenH. Yeah, especially if you're actually focusing on the golfing itself. >_> – Erik the Outgolfer – 2018-10-13T11:31:06.310

'0'*~-n instead of '0'*(len(s)%n) is shorter (and actually correct for shorter inputs). – nwellnhof – 2018-10-14T02:01:37.460

@nwellnhof Yeah, but it's definitely not the same thing. – Erik the Outgolfer – 2018-10-14T08:40:19.313

Maybe I wasn't clear enough. Your solution gives the wrong answer for strings like Programming Puzz (16 chars). Replacing '0'*(len(s)%n) with '0'*~-n fixes that and saves 7 bytes. – nwellnhof – 2018-10-14T09:37:24.010

@nwellnhof Ah, yep, applied a little patch over there, I'll have to test your suggestion a bit though. – Erik the Outgolfer – 2018-10-14T09:45:58.957

1

Python 2, 106 102 bytes

For once, the function outgolfs the lambda. -4 bytes for simple syntax manipulation, thanks to Jo King.

def f(n,H,s):
 x='0'*n;s+='0'*(n-len(s)%n)+bin(n)[2:].zfill(n)
 while s:x=H(x+s[:n]);s=s[n:]
 return x

Try it online!

Steven H.

Posted 2018-10-10T16:49:15.173

Reputation: 2 841

Shouldn't the result be 'Pe011', not 'e011'? – Triggernometry – 2018-10-16T16:06:46.430

That it should. Fixed! – Steven H. – 2018-10-29T22:20:52.003

Use semi-colons instead of newlines. -4 bytes

– Jo King – 2018-10-29T22:23:39.460

I didn't realize that worked for while loops as well, thanks! – Steven H. – 2018-10-29T22:25:50.377

1

Japt, 27 bytes

òV ú'0 pV¤ùTV)rÈ+Y gOvW}VçT

Try it!

I haven't found any capability for Japt to take functions directly as an input, so this takes a string which is interpreted as Japt code and expects it to define a function. Specifically, OvW takes the third input and interprets it as Japt, then g calls it. Replacing that with OxW allows input as a Javascript function instead, or if the function were (somehow) already stored in W it could just be W and save 2 bytes. The link above has the worked example of \$\mathcal{H}\$ that takes characters at odd indexes, while this one is the "multiply char-codes and take the 5 highest digits" example.

Due to the way Japt takes inputs, \$s\$ will be U, \$n\$ will be V, and \$\mathcal{H}\$ will be W

Explanation:

òV                             Split U into segments of length V
   ú'0                         Right-pad the short segment with "0" to the same length as the others
       p     )                 Add an extra element:
        V¤                       V as a base-2 string
          ùTV                    Left-pad with "0" until it is V digits long
              r                Reduce...
                        VçT          ...Starting with "0" repeated V times...
               È       }                                                  ...By applying:
                +Y               Combine with the previous result
                   gOvW          And run W as Japt code

Kamil Drakari

Posted 2018-10-10T16:49:15.173

Reputation: 3 461

0

GolfScript, 47 bytes

~:π'0'*:s\+s+π/);[sπ2base{48+}%+0π->]+{+1$~}*\;

Try it online!

wastl

Posted 2018-10-10T16:49:15.173

Reputation: 3 089

0

oK, 41 bytes

{(x#48)(y@,)/(0N,x)#z,,/$((x+x!-#z)#2)\x}

Try it online!

{                                       } /x is n, y is H, z is s.
                          (x+x!-#z)       /number of padding 0's needed + x
                         (         #2)\x  /binary(x) with this length
                      ,/$                 /to string
                    z,                    /append to z
             (0N,x)#                      /split into groups of length x
       (y@,)/                             /foldl of y(concat(left, right))...
 (x#48)                                   /...with "0"*x as the first left string

zgrep

Posted 2018-10-10T16:49:15.173

Reputation: 1 291