Estimate the mean minimum Hamming distance

Task

Inputs \$b \leq 100\$ and \$n \geq 2\$. Consider \$n\$ binary strings, each of length \$b\$ sampled uniformly and independently. We would like to compute the expected minimum Hamming distance between any pair. If \$n = 2\$ the answer is always \$b/2\$.

Correctness

Your code should ideally be within \$\pm0.5\$ of the correct mean. However, as I don't know what the correct mean is, for values of \$n\$ which are not too large I have computed an estimate and your code should be within \$\pm0.5\$ of my estimate (I believe my estimate is within \$\pm0.1\$ of the correct mean). For larger values than my independent tests will allow, your answer should explain why it is correct.

\$b = 99, n = 2^{16}.\$ Estimated avg. \$19.61\$.
\$b = 89, n = 2^{16}.\$ Estimated avg. \$16.3\$.
\$b = 79, n = 2^{16}.\$ Estimated avg. \$13.09\$.
\$b = 69, n = 2^{16}.\$ Estimated avg. \$10.03\$.
\$b = 59, n = 2^{16}.\$ Estimated avg. \$7.03\$.
\$b = 99, n = 2^{15}.\$ Estimated avg. \$20.67\$.
\$b = 89, n = 2^{15}.\$ Estimated avg. \$17.26\$.
\$b = 79, n = 2^{15}.\$ Estimated avg. \$13.99\$.
\$b = 69, n = 2^{15}.\$ Estimated avg. \$10.73\$.
\$b = 59, n = 2^{15}.\$ Estimated avg. \$7.74\$.
\$b = 99, n = 2^{14}.\$ Estimated avg. \$21.65\$.
\$b = 89, n = 2^{14}.\$ Estimated avg. \$18.23\$.
\$b = 79, n = 2^{14}.\$ Estimated avg. \$14.83\$.
\$b = 69, n = 2^{14}.\$ Estimated avg. \$11.57\$.
\$b = 59, n = 2^{14}.\$ Estimated avg. \$8.46\$.

Score

Your base score will be the number of the examples above that your code gets right within \$\pm0.5\$. On top of this you should add \$x > 16\$ for the largest \$n = 2^x\$ for which your code gives the right answer within \$\pm0.5\$ for all of \$b = 59, 69, 79, 89, 99\$ (and also all smaller values of \$x\$). That is, if your code can achieve this for \$n=2^{18}\$ then you should add \$18\$ to your score. You only get this extra score if you can also solve all the instances for all smaller \$n\$ as well. The number of strings \$n\$ will always be a power of two.

Time limits

For a single \$n, b\$ pair, your code should run on TIO without timing out.

Rougher approximation answers for larger values of \$n\$

For larger values of \$n\$ my independent estimates will not be as accurate but may serve as a helpful guide nonetheless. I will add them here as I can compute them.

\$b = 99, n = 2^{17}.\$ Rough estimate \$18.7\$.
\$b = 99, n = 2^{18}.\$ Rough estimate \$17.9\$.
\$b = 99, n = 2^{19}.\$ Rough estimate \$16.8\$.

Anush

Posted 2020-02-07T22:19:07.883

Reputation: 3 202

4I thought this was posted on the Sandbox earlier today. – S.S. Anne – 2020-02-08T01:19:06.573

@S.S.Anne Yes I posted it there first. I look forward to the first answers! – Anush – 2020-02-08T06:38:39.603

1The time limit is probably unnecessary as it both limits users to tio languages and needlessly disqualifies solutions that take a long time, but do solve the problem. I understand wanting to prevent brute forcing, but I’d argue that should also be valid – ATaco – 2020-02-08T07:44:53.657

1@ATaco I understand that point of view but it completely changes the challenge. Brute force solutions are valid here of course. It's just a question of what score you get. The time limit/TIO is also crucial as otherwise you will just get a higher score by having a more powerful computer or leaving it to run for longer. – Anush – 2020-02-08T07:48:44.597

Using TIO for timing is not a good idea. Had you left this in the sandbox longer, I'd have suggested you switching to requiring that submissions pass your examples within a certain time limit which you would verify (like [tag:fastest-code]). What you currently have makes it extremely difficult to determine what the score of any submission should actually be.

– FryAmTheEggman – 2020-02-08T19:03:11.707

@FryAmTheEggman I understand the complications all too well. [tag:fastest-code] is highly non-ideal as people can't test their code properly for speed themselves. Using TIO is non-ideal because it's not super reliable timing-wise. For this challenge where the exact timing is not so critical I decided that on balance it was the best option. Clearly if you have to report an exact time TIO is no good. It would be awesome if there were a server we could use for timing. – Anush – 2020-02-08T19:10:40.877

Estimate the mean minimum Hamming distance

Answers

Python 3, score = big(?)