Choose 3 strings and output the exact minimum Hamming distance between any pair

5

Consider $3$ binary strings of length $n$ chosen independently and uniformly at random. We are interested in computing the exact expected minimum Hamming distance between any pair. The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.

Task

Given an input n between 2 and 10, inclusive, consider $3$ binary strings of length $n$ chosen independently and uniformly at random. Output the exact expected minimum Hamming distance between any pair.

Output should be given as a fraction.

Your code should run in TIO without timing out.

Anush

Posted 6 years ago

Reputation: 3 202

1Challenges should be self-contained: Please add the definition of Hamming Distance. – AlienAtSystem – 6 years ago

I got slightly lost at the end, in the task section. You say "we sample" and then "compute the exact expected minimum hamming distance between any pair", but if we sampled, then there is no actual expectation, right? We just compute the Hamming distances? And then we output the minimum of the three? – RGS – 6 years ago

@RGS Expected here means the mean average. I'll change the wording to make it clearer. Thank you. – Anush – 6 years ago

Can you clarify what you mean by between any pair? Is that any of the three pairs once the three strings have been chosen? – Luis Mendo – 6 years ago

4Could you please provide some test cases? – lolad – 6 years ago

Also, do you actually have to generate the strings? Can you just choose a random value when comparing two characters? – lolad – 6 years ago

@LuisMendo Yes that's right – Anush – 6 years ago

@Anush Then please make it clear in the challenge text. People are not expected o read all comments – Luis Mendo – 6 years ago

10.375, 0.75, 1.125, 1.5234375, 1.93505859375, ...? – Jonathan Allan – 6 years ago

Does "Output should be given as a fraction" mean decimals & floats are not allowed? (i.e. we should output a pair of integers) – Jonathan Allan – 6 years ago

@JonathanAllan. The answer has to be exact. The easiest way is a pair of integers but you could also output a possibly recurring decimal if you prefer. – Anush – 6 years ago

1The denominator will be a power of two, so the decimal will terminate. – Christian Sievers – 6 years ago

I suggest the time restriction harder to miss in the spec, and tagging the challenge restricted-time. I suspect that the answers so far are too brute to meet the requirement. – xnor – 6 years ago

I think the time limit is enough to stop fully brute forcing over the 2^30 triples of strings, but should allow setting one string to all zeroes, which doesn't affect the distribution, and brute forcing over the 2^20 values of the other two. I'd be interested to see faster methods. – xnor – 6 years ago

1@xnor My code may have three loops, but it only iterates over the n+3 \choose 3 ways to write n as a sum of four nonnegative integers. For n=10 that is 286 cases. – Christian Sievers – 6 years ago

@ChristianSievers I missed that. What a nice method! I added a formula to your post. – xnor – 6 years ago

Answers

7

Pari/GP, 92 90 bytes

2 bytes saved thanks to @Nicolas!

n->sum(a=0,n,sum(b=0,n-a,sum(c=0,n-a-b,n!/a!/b!/c!/(n-a-b-c)!*min(a+b,min(a+c,b+c)))))/4^n

Try it online!

f(n) = \frac{1}{4^n} \sum_{\substack{a+b+c+d=n\\a,b,c,d\geq 0}} \binom{n}{a,b,c,d}\min(a+b,a+c,b+c)

A more direct brute force approach may be shorter in some languages, but I'm not sure if that's true for Pari. Anyway, it's a shame that Pari's min only accepts two arguments.

I can avoid the min and be about 6 times faster, but the compensating additional code makes it longer. Maybe in another language this is the shorter way. And maybe it is possible to simplify further and get rid of the innermost sum.

n->sum(a=0,n,sum(b=a,n-a,sum(c=b,n-a-b,n!/a!/b!/c!/(n-a-b-c)!*(a+b)*6/(1+(a==b)+(b==c))!)))/4^n

Try it online!

Christian Sievers

Posted 6 years ago

Reputation: 6 366

This is a really elegant mathematical solution! – Anush – 6 years ago

Would it be acceptable to place the f(n)= in the header section and save 5 bytes ? Asking as a new code-golfer. – Nicolas – 6 years ago

@Nicolas The remaining part would be a term with n as free variable which is not okay, but I can rewrite it as f=n->... and don't count f=, then the remaining part states the function without binding it to a name. Thanks for reminding me! – Christian Sievers – 6 years ago

2

Jelly, 31 bytes

Ḷṗ4S⁼¥Ƈɓ!×Ṗṁ4SƝṂƲ}:!P$}ð€S,4*$}

A monadic Link accepting an integer which yields a pair of integers, [numerator, denominator] (not reduced to simplest form).

Try it online! Or see all of n=2 to 10 reduced to simplest form.

How?

Uses the same approach as Christian Sievers' Pari/GP answer

Ḷṗ4S⁼¥Ƈɓ!×Ṗṁ4SƝṂƲ}:!P$}ð€S,4*$} - Link: integer, n
Ḷ                               - lowered range -> [0..n-1]
 ṗ4                             - Cartesian power 4 (all length 4 tuples using alphabet [0..n-1]
      Ƈ                         - filter keep those for which:
     ¥                          -   last two links as a dyad:
   S                            -     sum
    ⁼                           -     equals n?
       ɓ               ð€       - for each tuple do f(n, tuple):
        !                       -   factorial = n*(n-1)*...*1
                 }              -   use right argument:
                Ʋ               -     last four links as a monad:
          Ṗ                     -       pop i.e. (a,b,c,d)->(a,b,c)
           ṁ4                   -       mould like 4 -> (a,b,c,a)
              Ɲ                 -       for neighbours:
             S                  -         sum -> (a+b,b+c,c+a)
               Ṃ                -       minimum
         ×                      -   multiply
                      }         -   use right argument:
                     $          -     last two links as a monad:
                   !            -       factorial -> (a!,b!,c!,d!)
                    P           -       product -> a!×b!×c!×d!
                  :             -   integer division
                         S      - sum
                              } - use right argument:
                             $  -   last two links as a monad:
                           4    -     four
                            *   -     exponentiate -> 4^n
                          ,     - pair

Jonathan Allan

Posted 6 years ago

Reputation: 67 804

Very nice. Maybe I should have made the range 2..30 :) – Anush – 6 years ago

1This does n=30 in ~18 seconds. It's much slower than the Pari/GP because I create all n^4 (a,b,c,d) tuples and filter to those that sum to n (because it's golf). – Jonathan Allan – 6 years ago

2

05AB1E, 26 bytes

Ý4ãʒOQ}ε¨Ćü+ßI!*y!P÷}O4Im‚

Port of @JonathanAllan's Jelly answer, which uses the same formula as @ChristianSievers's Pari/GP answer.
Just like his answer, the output is not reduced to its simplest form.

Try it online or verify all test cases.

Explanation:

Ý                 # Push a list in the range [0, (implicit) input-integer]
                  # (NOTE: Jonathan's Jelly answer uses the range [0,input) instead of
                  #  [0,input], but that doesn't pose an issue, since the mapped value will
                  #  result in 0, not affecting the total sum)
 4ã               # Create all possible quartets by using the cartesian product with 4
   ʒ              # Filter these quartets by:
    O             #  Take the sum
     Q            #  And check if it's equal to the (implicit) input-integer
   }ε             # After the filter: map each remaining quartet [a,b,c,d] to:
     ¨            #  Remove the last item d: [a,b,c]
      Ć           #  Enclose the list, appending its own head: [a,b,c,a]
       ü+         #  Sum each consecutive pair: [a+b,b+c,c+a]
         ß        #  Pop and push the minimum of this
          I!      #  Push the input, and take its factorial
            *     #  Multiply both by each other
             y!   #  Take the factorial of each value in the quartet: [a!,b!,c!,d!]
               P  #  Take the product of this: a!*b!*c!*d!
                ÷ #  Integer-divide the two: (min(a+b,b+c,c+a)*n!) // (a!*b!*c!*d!)
    }O            # After the map: take the sum of all values in the list
       I          # Push the input-integer again
      4 m         # And take 4 to the power this
         ‚        # And finally pair both together
                  # (after which this pair is output implicitly as result)

Kevin Cruijssen

Posted 6 years ago

Reputation: 67 575