13
1
The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.
Let P
be a binary string of length n
and T
be a binary string of length 2n-1
. We can compute the n
Hamming distances between P
and every n
-length substring of T
in order from left to right and put them into an array (or list).
Example Hamming distance sequence
Let P = 101
and T = 01100
. The sequence of Hamming distances you get from this pair is 2,2,1
.
Definition of closeness
Now let's consider two such sequences of Hamming distances. Say x = (0, 2, 2, 3, 0)
and y = (2, 1, 4, 4, 2)
as examples. We say that x
and y
are close
if y <= x <= 2*y
or if x <= y <= 2*x
. Here the scalar multiplication and inequality are taken elementwise. That is to say, for two sequences A
and B
, A <= B iff A[i] <= B[i]
for all indices i
.
Note that sequences of Hamming distances form a partial order under this way of comparing them. In other words, many pairs of sequences are neither greater than or equal nor less than or equal to each other. For example (1,2)
and (2,1)
.
So using the example above, (0, 2, 2, 3, 0) <= 2*(2, 1, 4, 4, 2) = (4, 2, 8, 8, 4)
but (0, 2, 2, 3, 0)
is not bigger than (2, 1, 4, 4, 2)
. Also (2, 1, 4, 4, 2)
is not smaller than or equal to 2*(0, 2, 2, 3, 0) = (0, 4, 4, 6, 0)
. As a result x
and y
are not close to each other.
Task
For increasing n
starting at n=1
, consider all possible pairs of binary strings P
of length n
and T
of length 2n-1
. There are 2^(n+2n-1)
such pairs and hence that many sequences of Hamming distances. However many of those sequences will be identical . The task is to find the size of the largest set of Hamming distance sequences so that no two sequences are close to each other.
Your code should output one number per value of n
.
Score
Your score is broadly speaking the highest n
your code reaches on my machine in 5 minutes (but read on). The timing is for the total running time, not the time just for that n
.
In order to give scores for non-optimal answers, because finding optimal answers is likely to be hard, we will need a slightly subtle scoring system. Your score is the highest value of n
for which no one else has posted a higher correct answer for any size which is smaller than equal to this. For example, if you output 2, 4, 21
and someone else outputs 2, 5, 15
then you would only score 1
as someone else has a better answer for n = 2
. If you output 2, 5, 21
then you would score 3
no matter what anyone else outputs because those answers are all optimal. Clearly if you have all optimum answers then you will get the score for the highest n
you post. However, even if your answer is not the optimum, you can still get the score if no one else can beat it.
Example answers and worked example
(This answers are as yet unchecked. Independent verification would be gratefully received.)
Thanks to ETHproductions:
- n = 1 gives 2.
- n = 2 gives 5.
- n = 3 gives 21.
Let's look at n = 2
in more detail. In this case the full list of Hamming distance sequences (represented by tuples here) is:
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
We can see that (0,0)
is not close to any other tuple. In fact if we take (0, 0)
, (0, 1)
, (1, 0)
, (2, 1)
, (1,2)
then none of those tuples are close to any of the others. This gives a score of 5
for n = 2
.
For n = 3
the full list of distinct Hamming distance sequences is:
[(0, 0, 0), (0, 0, 1), (0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 2, 1), (0, 2, 2), (0, 2, 3), (0, 3, 0), (0, 3, 1), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 0), (1, 2, 1), (1, 2, 2), (1, 2, 3), (1, 3, 0), (1, 3, 1), (1, 3, 2), (2, 0, 1), (2, 0, 2), (2, 0, 3), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 1, 3), (2, 2, 0), (2, 2, 1), (2, 2, 2), (2, 2, 3), (2, 3, 1), (2, 3, 2), (2, 3, 3), (3, 0, 2), (3, 0, 3), (3, 1, 0), (3, 1, 1), (3, 1, 2), (3, 2, 0), (3, 2, 1), (3, 2, 2), (3, 3, 2), (3, 3, 3)]
Of those 48
sequences, we can pick out a set of size 21
so that no pair in that set are close to each other.
Languages and libraries
You can use any available language and libraries you like. Where feasible, it would be good to be able to run your code so please include a full explanation for how to run/compile your code in Linux if at all possible.
My Machine The timings will be run on my 64-bit machine. This is a standard ubuntu install with 8GB RAM, AMD FX-8350 Eight-Core Processor and Radeon HD 4250. This also means I need to be able to run your code.
Leading answer
- Score of 4 for 2, 5, 21, 83, 361 by Christian Sievers. C++
- Score of 5 for 2, 5, 21, 83, 372 by fəˈnɛtɪk. Javascript
After looking at your question, it shows some similarities to spies, revised on hackerrank, which is a NP-complete problem
– fəˈnɛtɪk – 2017-03-15T11:40:37.850@fəˈnɛtɪk Great! Note that my question doesn't require optimal solutions to get a good score. – None – 2017-03-15T12:45:09.493
@fəˈnɛtɪk Can you also confirm the answers for 1,2,3 in the question? – None – 2017-03-15T17:20:07.553
@fəˈnɛtɪk I very much doubt it is NP-hard. You would have to encode Set Packing or another NP-complete problem into a single integer with only a polynomial change in problem size. – None – 2017-03-15T18:27:37.283
297 unique hamming arrays for 4, 2040 unique arrays for 5 – fəˈnɛtɪk – 2017-03-16T15:54:20.247
The graphic representation of the possible sequences is a n-dimensional hypercube with side length n – fəˈnɛtɪk – 2017-03-16T16:19:11.530
bash:
echo -e "2\n5\n21\n81"; yes 356
;-) - Starting from the fourth entry, these are just lower bounds... – Christian Sievers – 2017-03-17T16:56:20.347@ChristianSievers Nice try my friend :) – None – 2017-03-17T16:59:46.480
Not good enough. So I do
s/81/83/
, but there is a lot of work to do before I can change the 356. – Christian Sievers – 2017-03-17T19:59:05.120@ChristianSievers Do you have code you could post as an answer? – None – 2017-03-17T20:00:53.533
I'll later continue to work on that – Christian Sievers – 2017-03-17T20:06:24.670