Count the number of Hamming distance sequences

9

1

The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.

Let P be a binary string of length n and T be a binary string of length 2n-1. We can compute the n Hamming distances between P and every n-length substring of T in order from left to right and put them into an array (or list).

Example Hamming distance sequence

Let P = 101 and T = 01100. The sequence of Hamming distances you get from this pair is 2,2,1.

Task

For increasing n starting at n=1, consider all possible pairs of binary strings P of length n and T of length 2n-1. There are 2**(n+2n-1) such pairs and hence that many sequences of Hamming distances. However many of those sequences will be identical . The task is to find how many are distinct for each n.

Your code should output one number per value of n.

Score

Your score is the highest n your code reaches on my machine in 5 minutes. The timing is for the total running time, not the time just for that n.

Who wins

The person with the highest score wins. If two or more people end up with the same score then it is the first answer that wins.

Example answers

For n from 1 to 8 the optimal answers are 2, 9, 48, 297, 2040, 15425, 125232, 1070553.

Languages and libraries

You can use any available language and libraries you like. Where feasible, it would be good to be able to run your code so please include a full explanation for how to run/compile your code in Linux if at all possible.

My Machine The timings will be run on my 64-bit machine. This is a standard ubuntu install with 8GB RAM, AMD FX-8350 Eight-Core Processor and Radeon HD 4250. This also means I need to be able to run your code.

Leading answers

  • 11 in C++ by feersum. 25 seconds.
  • 11 in C++ by Andrew Epstein. 176 seconds.
  • 10 in Javascript by Neil. 54 seconds.
  • 9 in Haskell by nimi. 4 minutes and 59 seconds.
  • 8 in Javascript by fəˈnɛtɪk. 10 seconds.

user9206

Posted 2017-03-24T17:24:08.363

Reputation:

.. any available free* languages? – Stewie Griffin – 2017-03-24T17:37:04.197

[tag:fastest-code]? not [tag:fastest-algorithm]? You know, people could go with language with a darn fast interpreter and make a significant difference in time, but time complexity is always same, so It somewhat makes it fair. – Matthew Roh – 2017-03-24T17:43:51.913

Related. – Martin Ender – 2017-03-24T17:52:17.800

4@SIGSEGV fastest-code leaves more space for optimizations through both code level optimizations and a good algorithm. So I do think that faster-code is better than faster-algorithm. – Dada – 2017-03-24T17:52:28.367

Answers

3

C++11 (should get to 11 or 12)

At the moment this is single-threaded.

To compile:

g++ -std=c++11 -O2 -march=native feersum.cpp
#include <iostream>
#include <unordered_set>
#include <vector>
#include <unordered_map>
#include <string.h>

using seq = uint64_t;
using bitvec = uint32_t;
using seqset = std::unordered_set<seq>;
using std::vector;

#define popcount __builtin_popcount
#define MAX_N_BITS 4

bitvec leading(bitvec v, int n) {
    return v & ((1U << n) - 1);
}
bitvec trailing(bitvec v, int n, int total) {
    return v >> (total - n);
}

bitvec maxP(int n) {
    return 1 << (n - 1);  // ~P, ~T symmetry
}

void prefixes(int n, int pre, int P, seqset& p) {
    p.clear();
    for (bitvec pref = 0; pref < (1U << pre); pref++) {
        seq s = 0;
        for (int i = 0; i < pre; i++) {
            seq ham = popcount(trailing(pref, pre - i, pre) ^ leading(P, pre - i));
            s |= ham << i * MAX_N_BITS;
        }
        p.insert(s);
    }
}



vector<seqset> suffixes(int n, int suf, int off) {
    vector<seqset> S(maxP(n));
    for (bitvec P = 0; P < maxP(n); P++) {
        for (bitvec suff = 0; suff < (1U << suf); suff++) {
            seq s = 0;
            for (int i = 0; i < suf; i++) {
                seq ham = popcount(leading(suff, i + 1) ^ trailing(P, i + 1, n));
                s |= ham << (off + i) * MAX_N_BITS;
            }
            S[P].insert(s);
        }
    }
    return S;
}



template<typename T> 
void mids(int n, int npre, int nsuf, int mid, bitvec P, T& S, const seqset& pre) {
    seq mask = (1ULL << (npre + 1) * MAX_N_BITS) - 1;
    for(bitvec m = 0; m < 1U << mid; m++) {
        int pc = popcount(P ^ m);
        if(pc * 2 > n) continue; // symmetry of T -> ~T : replaces x with n - x
        seq s = (seq)pc << npre * MAX_N_BITS;
        for(int i = 0; i < npre; i++)
            s |= (seq)popcount(trailing(P, n - npre + i, n) ^ leading(m, n - npre + i)) << i * MAX_N_BITS;
        for(int i = 0; i < nsuf; i++)
            s |= (seq)popcount(leading(P, mid - 1 - i) ^ trailing(m, mid - 1 - i, mid)) << (npre + 1 + i) * MAX_N_BITS;
        for(seq a: pre)
            S[(s + a) & mask].insert(P | (s + a) << n);
    }
}

uint64_t f(int n) {
    if (n >= 1 << MAX_N_BITS) {
        std::cerr << "n too big";
        exit(1);
    }
    int tlen = 2*n - 1;
    int mid = n;
    int npre = (tlen - mid) / 2;
    if(n>6) --npre;
    int nsuf = tlen - npre - mid;
    seqset preset;
    auto sufs = suffixes(n, nsuf, npre + 1);
    std::unordered_map<seq, std::unordered_set<uint64_t>> premid;
    for(bitvec P = 0; P < maxP(n); P++) {
        prefixes(n, npre, P, preset);
        mids(n, npre, nsuf, mid, P, premid, preset);
    }
    uint64_t ans = 0;
    using counter = uint8_t;
    vector<counter> found((size_t)1 << nsuf * MAX_N_BITS);
    counter iz = 0;
    for(auto& z: premid) {
        ++iz;
        if(!iz) {
            memset(&found[0], 0, found.size() * sizeof(counter));
            ++iz;
        }
        for(auto& pair: z.second) {
            seq s = pair >> n;
            uint64_t count = 0;
            bitvec P = pair & (maxP(n) - 1);
            for(seq t: sufs[P]) {
                seq suff = (s + t) >> (npre + 1) * MAX_N_BITS;
                if (found[suff] != iz) {
                    ++count;
                    found[suff] = iz;
                }
            }
            int middle = int(s >> npre * MAX_N_BITS) & ~(~0U << MAX_N_BITS);
            ans += count << (middle * 2 != n);
        }
    }

    return ans;
}

int main() {
    for (int i = 1; ; i++)
        std::cout << i << ' ' << f(i) << std::endl;
}

feersum

Posted 2017-03-24T17:24:08.363

Reputation: 29 566

Get's to 11 in less than 30 seconds! – None – 2017-04-02T11:52:31.650

In case it's of interest: feersum.cpp:111:61: warning: shifting a negative signed value is undefined [-Wshift-negative-value] int middle = int(s >> npre * MAX_N_BITS) & ~(~0 << MAX_N_BITS); – None – 2017-04-02T16:52:22.617

5

Haskell, score 9

import Data.Bits
import Data.List
import qualified Data.IntSet as S

main = mapM_ (print . S.size . S.fromList . hd) [1..9]

hd :: Int -> [Int]
hd n = [foldl' (\s e->s*m+e) 0 [popCount $ xor p (shiftR t i .&. m)|i<-[(0::Int)..n-1]] | let m=2^n-1,p<-[(0::Int)..2^n-1],t<-[(0::Int)..2^(2*n-1)-1]]

Compile with -O3. It takes 6min35s on my 6 year old laptop hardware to run up to n=9, so maybe it's under 5min on the reference hardware.

> time ./113785
2
9
48
297
2040
15425
125232
1070553
9530752

real  6m35.763s
user  6m27.690s
sys   0m5.025s

nimi

Posted 2017-03-24T17:24:08.363

Reputation: 34 639

16 year laptop? Damn, that's some outdated tech! – Matthew Roh – 2017-03-25T17:43:23.093

@SIGSEGV: maybe it's outdated, but besides counting the number of Hamming distance sequences it does its job quite well. – nimi – 2017-03-25T22:31:25.880

4

JavaScript, score 10

findHamming = m => { 
    if (m < 2) return 2;
    let popcnt = x => x && (x & 1) + popcnt(x >> 1);
    let a = [...Array(1 << m)].map((_,i) => popcnt(i));
    let t = 0;
    let n = (1 << m) - 1;
    for (let c = 0; c <= m; c++) {
        for (let g = 0; g <= c; g++) {
            let s = new Set;
            for (let i = 1 << m; i--; ) {
                for (let j = 1 << (m - 1); j--; ) {
                    if (a[i ^ j + j] != c) continue;
                    for (let k = 1 << m - 1; k--; ) {
                        if (a[i ^ k] != g) continue;
                        let f = j << m | k;
                        let h = 0;
                        for (l = m - 1; --l; ) h = h * (m + 1) + a[i ^ f >> l & n];
                        s.add(h);
                    }
                }
            }
            t += s.size * (g < c ? 2 : 1);
        }
    }
    return t;
};
let d = Date.now(); for (let m = 1; m < 11; m++) console.log(m, findHamming(m), Date.now() - d);

Explanation: Calculating n=10 is difficult because there are over two billion pairs and over 26 billion potential sequences. In order to speed things up I split the calculation up into 121 bins. Because the sequences are invariant under bitwise complement, I can assume without loss of generality that the middle bit of T is zero. This means that I can determine the first and last elements of the sequence independently from the the top n-1 and bottom n-1 bits of T. Each bin corresponds to a different pair of first and last elements; I then loop through all the possible sets of top and bottom bits that correspond to each bin and calculate the remaining elements of the sequence, finally counting the unique sequences for each bin. It then remains to total all 121 bins. Originally taking 45 hours, this now completed in a little under three and a half minutes on my AMD FX-8120. Edit: Thanks to @ChristianSievers for a 50% speedup. Full output:

1 2 0
2 9 1
3 48 1
4 297 2
5 2040 7
6 15425 45
7 125232 391
8 1070553 1844
9 9530752 15364
10 86526701 153699

Neil

Posted 2017-03-24T17:24:08.363

Reputation: 95 035

Your code gives no output currently. – felipa – 2017-03-27T18:08:54.887

@felipa Not sure what you mean. It's an anonymous function, so you call it (perhaps by assigning it to a variable first and then calling the variable as if it was a function) and pass it n as a parameter. (Sorry for the bad choice of parameter name there.) – Neil – 2017-03-27T18:25:32.190

The question asks for code that prints out answer for n up to the highest value it can get to in 5 minutes. "Your code should output one number per value of n." – felipa – 2017-03-27T18:35:05.107

It would be great if your code worked up from n = 1 and outputted the timing at each stage. From the question " The timing is for the total running time, not the time just for that n." – None – 2017-03-28T07:50:32.923

1@Lembik Added timing code and also worked around bug for n=1 (don't know offhand why it hangs). – Neil – 2017-03-28T08:16:06.790

Is the timing in hundredths of a second? – None – 2017-03-28T08:47:14.167

@Lembik Sorry, I should have said, it's in milliseconds. – Neil – 2017-03-28T08:50:48.993

Great! There is another symmetry in the construction of Hamming distance sequences: when P and T are reversed, the resulting sequence is also reversed, so the set to count is closed under reversal. I'm not sure about the best way to use this fact, but it gives an easy way to improve your program: let g only go up to c, and add twice the bin size unless c=g. (Some extra work should also be able to speed up the case c=g, but that might not be worth the effort.) – Christian Sievers – 2017-03-28T20:02:17.783

@ChristianSievers Very nice! I'm slightly disappointed that I only get a 50% speedup though. – Neil – 2017-03-28T20:53:33.330

Huh? We've reduced 121 bins (in the n=10 case) to 66. What did you expect? – Christian Sievers – 2017-03-28T21:02:05.703

I expected a (121-66)/66*100=83% speedup. – Neil – 2017-03-28T21:20:53.720

(actual speedup 58%, misread my screen sorry, still less than 83%) – Neil – 2017-03-28T21:22:18.130

I see. More than the expected fraction of 1/(m+1) fall into the c=g case, putting more pressure on the set. Also, how long does the computation of a take? – Christian Sievers – 2017-03-28T22:15:31.890

@ChristianSievers Indeed, for n=10 the c=g case accounts for about 15% of the distinct sequences. (I don't know how many unique sequences is accounts for. Also, a is trivial to compute; it takes a second for n=20.) – Neil – 2017-03-28T23:27:14.063

4

C++, score 10 11

This is a translation of @Neil's answer into C++, with some simple parallelization. n=9 completes in 0.4 seconds, n=10 in 4.5 seconds, and n=11 in approximately 1 minute on my 2015 Macbook Pro. Also, thanks to @ChristianSievers. Due to his comments on @Neil's answer, I noticed some additional symmetries. From the original 121 buckets (for n=10), to 66 buckets when accounting for reversal, I've gotten down to just 21 buckets.

#include <iostream>
#include <cstdint>
#include <unordered_set>
#include <thread>
#include <future>
#include <vector>

using namespace std;

constexpr uint32_t popcnt( uint32_t v ) {
    uint32_t c = v - ( ( v >> 1 ) & 0x55555555 );
    c = ( ( c >> 2 ) & 0x33333333 ) + ( c & 0x33333333 );
    c = ( ( c >> 4 ) + c ) & 0x0F0F0F0F;
    c = ( ( c >> 8 ) + c ) & 0x00FF00FF;
    c = ( ( c >> 16 ) + c ) & 0x0000FFFF;
    return c;
}

template<uint32_t N>
struct A {
    constexpr A() : arr() {
        for( auto i = 0; i != N; ++i ) {
            arr[i] = popcnt( i );
        }
    }
    uint8_t arr[N];
};

uint32_t n = ( 1 << M ) - 1;
constexpr auto a = A < 1 << M > ();

uint32_t work( uint32_t c, uint32_t g, uint32_t mult ) {
    unordered_set<uint64_t> s;
    // Empirically derived "optimal" value
    s.reserve( static_cast<uint32_t>( pow( 5, M ) ) );

    for( int i = ( 1 << M ) - 1; i >= 0; i-- ) {
        for( uint32_t j = 1 << ( M - 1 ); j--; ) {
            if( a.arr[i ^ j + j] != c ) {
                continue;
            }

            for( uint32_t k = 1 << ( M - 1 ); k--; ) {
                if( a.arr[i ^ k] != g ) {
                    continue;
                }

                uint64_t f = j << M | k;
                uint64_t h = 0;

                for( uint32_t l = M - 1; --l; ) {
                    h = h * ( M + 1 ) + a.arr[i ^ ( f >> l & n )];
                }

                s.insert( h );
            }
        }
    }

    return s.size() * mult;

}

int main() {
    auto t1 = std::chrono::high_resolution_clock::now();

    if( M == 1 ) {
        auto t2 = std::chrono::high_resolution_clock::now();
        auto seconds = chrono::duration_cast<chrono::milliseconds>( t2 - t1 ).count() / 1000.0;
        cout << M << ": " << 2 << ", " << seconds << endl;
        return 0;
    }

    uint64_t t = 0;
    vector<future<uint32_t>> my_vector;

    if( ( M & 1 ) == 0 ) {
        for( uint32_t c = 0; c <= M / 2; ++c ) {
            for( uint32_t g = c; g <= M / 2; ++g ) {
                uint32_t mult = 8;

                if( c == M / 2 && g == M / 2 ) {
                    mult = 1;
                } else if( g == c || g == M / 2 ) {
                    mult = 4;
                }

                my_vector.push_back( async( work, c, g, mult ) );
            }

        }

        for( auto && f : my_vector ) {
            t += f.get();
        }

    } else {
        for( uint32_t c = 0; c <= ( M - 1 ) / 2; ++c ) {
            for( uint32_t g = c; g <= M - c; ++g ) {
                uint32_t mult = 4;

                if( g == c || g + c == M ) {
                    mult = 2;
                }

                my_vector.push_back( async( work, c, g, mult ) );
            }

        }

        for( auto && f : my_vector ) {
            t += f.get();
        }

    }

    auto t2 = std::chrono::high_resolution_clock::now();
    auto seconds = chrono::duration_cast<chrono::milliseconds>( t2 - t1 ).count() / 1000.0;
    cout << M << ": " << t << ", " << seconds << endl;
    return 0;
}

Use the following script to execute the code:

#!/usr/bin/env bash

for i in {1..10}
do
    clang++ -std=c++14 -march=native -mtune=native -Ofast -fno-exceptions -DM=$i hamming3.cpp -o hamming
    ./hamming
done

The output was as follows: (The format is M: result, seconds)

1: 2, 0
2: 9, 0
3: 48, 0
4: 297, 0
5: 2040, 0
6: 15425, 0.001
7: 125232, 0.004
8: 1070553, 0.029
9: 9530752, 0.419
10: 86526701, 4.459
11: 800164636, 58.865

n=12 took 42 minutes to calculate on a single thread, and gave a result of 7368225813.

Andrew Epstein

Posted 2017-03-24T17:24:08.363

Reputation: 341

How would you compile this in ubuntu using clang? – felipa – 2017-03-27T17:56:22.120

@felipa I think the answer is sudo apt-get install libiomp-dev. – None – 2017-03-27T18:17:39.973

It would be great if your code worked up from n = 1 and outputted the timing at each stage. From the question " The timing is for the total running time, not the time just for that n." – None – 2017-03-28T07:50:21.313

Rather than reimplementing it you could probably just use __builtin_popcount. – Neil – 2017-03-28T08:52:50.473

@Lembik: I'll make the changes later today. @Neil: The popcnt function only gets evaluated at compile time, and I don't know how to use __builtin_popcount in a constexpr context. I could go with the naïve implementation and it wouldn't affect the run time. – Andrew Epstein – 2017-03-28T12:57:00.503

Unfortunately it uses too much RAM for n = 11 for me to test. – None – 2017-03-28T16:35:06.033

I was thinking you could use __builtin_popcount() instead of a.arr[], as it should compile to a single instruction anyway, which avoids hitting memory, so might actually be faster. – Neil – 2017-03-28T20:31:27.340

@Neil: I tried out your suggestion. It did indeed compile to a single instruction, but profiling with valgrind indicated that using a.arr[] was faster. I suspect that due to the table's small size, it is able to remain in L1 cache. – Andrew Epstein – 2017-03-29T04:39:10.767

Huh, I hadn't realised how big L1 cache is these days. Also, nice going for calculating n=12. – Neil – 2017-03-29T07:40:57.430

How much RAM did calculating n = 11 and 12 take for you? – None – 2017-03-29T08:18:27.350

Could you change the code so that it works up from n = 1? At the moment I have to run it for each n separately and add the times. – None – 2017-03-29T08:23:18.050

@Lembik: n=11 on a single thread took 1534 MB at peak, n=11 on multiple threads took 7166 MB at peak, and n=12 on a single thread took 11521 MB at peak. – Andrew Epstein – 2017-03-30T02:22:13.213

I need -pthread as in g++ -Wall -pthread -std=c++14 -DM=10 hamming.cpp -Ofast -o hamming. Do you really mean if( a.arr[i ^ j + j] != c ) { . I ask as i ^ j + j means i ^ (j + j) (so feersum tells me). – None – 2017-03-30T09:18:48.563

That's very impressive! I see that by flipping the bits in P only (or T only, but we already fixed one bit there), we get a distance sequence where every entry e is replaced by n-e, so the (c,g) bin is bijective to the (n-c,n-g) one. But what is your other symmetry in the even n case? – Christian Sievers – 2017-03-30T09:50:01.087

BTW, I wonder if it's worth while not running through all possible j and k and checking if they have the correct hamming distance, but instead have lists of all numbers of a given hamming weight and bitwise xor only those to i. – Christian Sievers – 2017-03-30T10:22:01.307

Oh I see! Flipping every second bit in P and T gives a distance sequence where every second entry changes to n-e, establishing a correspondance between the (c,g) and the (c,n-g) bin when n is even. – Christian Sievers – 2017-03-30T11:22:55.733

@ChristianSievers: Honestly I didn't even think about it at that level. I just printed the size of the 121 bins, and then grouped them together by identical values. Then, I just looked at the minimal (c,g) pairs and identified a pattern. – Andrew Epstein – 2017-03-31T13:00:58.523

2

JavaScript 2,9,48,297,2040,15425,125232,1070553,9530752

Run in console:

console.time("test");
h=(w,x,y,z)=>{retVal=0;while(x||y){if(x%2!=y%2)retVal++;x>>=1;y>>=1}return retVal*Math.pow(w+1,z)};
sum=(x,y)=>x+y;
for(input=1;input<10;input++){
  hammings=new Array(Math.pow(input+1,input));
  for(i=1<<(input-1);i<1<<input;i++){
    for(j=0;j<1<<(2*input);j++){
      hamming=0;
      for(k=0;k<input;k++){
        hamming+=(h(input,(j>>k)%(1<<input),i,k));
      }
      hammings[hamming]=1;
    }
  }
  console.log(hammings.reduce(sum));
}
console.timeEnd("test");

Try it online!

Or as a Stack Snippet:

h=(w,x,y,z)=>{retVal=0;while(x||y){if(x%2!=y%2)retVal++;x>>=1;y>>=1}return retVal*Math.pow(w+1,z)};

function findHamming() {
  console.time("test");
  input = parseInt(document.getElementById("input").value);
  hammings=new Array(Math.pow(input+1,input));
  for(i=1<<(input-1);i<1<<input;i++){
    for(j=0;j<1<<(2*input);j++){
      hamming=0;
      for(k=0;k<input;k++){
        hamming+=(h(input,(j>>k)%(1<<input),i,k));
      }
      hammings[hamming]=1;
    }
  }
  document.getElementById("output").innerHTML = hammings.reduce((x,y)=>x+y);
  console.timeEnd("test");
}
<input type="text" id="input" value="6">
<button onclick="findHamming()"> Run </button>
<pre id="output">

The code preinitializes the array to make adding 1s to the array much faster

The code finds all the hamming distance sequences and treating them as numbers base (input+1), uses them to place 1s in an array. As a result, this generates an array with the n 1s where n is the number of unique hamming distance sequences. Finally, the number of 1s is counted using array.reduce() to sum all of the values in the array.

This code will not be able to run for input of 10 as it hits memory limits

This code runs in O(2^2n) time because that's how many elements it generates.

fəˈnɛtɪk

Posted 2017-03-24T17:24:08.363

Reputation: 4 166

1Unsurprisingly, trying to create a 26*10^9 element array doesn't work – fəˈnɛtɪk – 2017-03-24T18:55:59.437

n = 9 takes 5 minutes and 30 seconds for me using node.js so is just too slow. – None – 2017-03-24T20:38:54.780

@Lembik n = 8 originally took 24 seconds on my PC, but I was able to optimise the code so that n = 8 took 6 seconds. I then tried n = 9 and that took 100 seconds. – Neil – 2017-03-24T21:37:56.590

@Neil You should submit an answer ! – None – 2017-03-24T21:49:34.443

It would be great if your code worked up from n = 1 and outputted the timing at each stage. From the question " The timing is for the total running time, not the time just for that n." – None – 2017-03-28T07:50:42.420

What unit is the time in? – None – 2017-03-28T16:29:32.433

Time is in milliseconds – fəˈnɛtɪk – 2017-03-28T21:45:38.810