Finding approximate correlations

14

2

Consider a binary string S of length n. Indexing from 1, we can compute the Hamming distances between S[1..i+1] and S[n-i..n] for all i in order from 0 to n-1. The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. For example,

S = 01010

gives

[0, 2, 0, 4, 0].

This is because 0 matches 0, 01 has Hamming distance two to 10, 010 matches 010, 0101 has Hamming distance four to 1010 and finally 01010 matches itself.

We are only interested in outputs where the Hamming distance is at most 1, however. So in this task we will report a Y if the Hamming distance is at most one and an N otherwise. So in our example above we would get

[Y, N, Y, N, Y]

Define f(n) to be the number of distinct arrays of Ys and Ns one gets when iterating over all 2^n different possible bit strings S of length n.

Task

For increasing n starting at 1, your code should output f(n).

Example answers

For n = 1..24, the correct answers are:

1, 1, 2, 4, 6, 8, 14, 18, 27, 36, 52, 65, 93, 113, 150, 188, 241, 279, 377, 427, 540, 632, 768, 870

Scoring

Your code should iterate up from n = 1 giving the answer for each n in turn. I will time the entire run, killing it after two minutes.

Your score is the highest n you get to in that time.

In the case of a tie, the first answer wins.

Where will my code be tested?

I will run your code on my (slightly old) Windows 7 laptop under cygwin. As a result, please give any assistance you can to help make this easy.

My laptop has 8GB of RAM and an Intel i7 5600U@2.6 GHz (Broadwell) CPU with 2 cores and 4 threads. The instruction set includes SSE4.2, AVX, AVX2, FMA3 and TSX.

Leading entries per language

  • n = 40 in Rust using CryptoMiniSat, by Anders Kaseorg. (In Lubuntu guest VM under Vbox.)
  • n = 35 in C++ using the BuDDy library, by Christian Seviers. (In Lubuntu guest VM under Vbox.)
  • n = 34 in Clingo by Anders Kaseorg. (In Lubuntu guest VM under Vbox.)
  • n = 31 in Rust by Anders Kaseorg.
  • n = 29 in Clojure by NikoNyrh.
  • n = 29 in C by bartavelle.
  • n = 27 in Haskell by bartavelle
  • n = 24 in Pari/gp by alephalpha.
  • n = 22 in Python 2 + pypy by me.
  • n = 21 in Mathematica by alephalpha. (Self reported)

Future bounties

I will now give a bounty of 200 points for any answer that gets up to n = 80 on my machine in two minutes.

user9206

Posted 2017-06-03T19:19:07.807

Reputation:

Do you know of some trick that will allow someone to find a faster algorithm than a naive brute force? If not this challenge is "please implement this in x86" (or maybe if we know your GPU...). – Jonathan Allan – 2017-06-04T07:15:53.843

@JonathanAllan It is certainly possible to speed up a very naive approach. Exactly how fast you can get I am not sure. Interestingly, if we changed the question so that you get a Y if the Hamming distance is at most 0 and an N otherwise, then there is a known closed form formula. – None – 2017-06-04T08:16:08.907

@Lembik Do we measure CPU time or real time? – flawr – 2017-06-11T19:32:40.857

@flawr I am measuring real time but running it a few times and taking the minimum to eliminate oddities. – None – 2017-06-11T19:57:06.877

Answers

9

Rust + CryptoMiniSat, n ≈ 41

src/main.rs

extern crate cryptominisat;
extern crate itertools;

use std::iter::once;
use cryptominisat::{Lbool, Lit, Solver};
use itertools::Itertools;

fn make_solver(n: usize) -> (Solver, Vec<Lit>) {
    let mut solver = Solver::new();
    let s: Vec<Lit> = (1..n).map(|_| solver.new_var()).collect();
    let d: Vec<Vec<Lit>> = (1..n - 1)
        .map(|k| {
                 (0..n - k)
                     .map(|i| (if i == 0 { s[k - 1] } else { solver.new_var() }))
                     .collect()
             })
        .collect();
    let a: Vec<Lit> = (1..n - 1).map(|_| solver.new_var()).collect();
    for k in 1..n - 1 {
        for i in 1..n - k {
            solver.add_xor_literal_clause(&[s[i - 1], s[k + i - 1], d[k - 1][i]], true);
        }
        for t in (0..n - k).combinations(2) {
            solver.add_clause(&t.iter()
                                   .map(|&i| d[k - 1][i])
                                   .chain(once(!a[k - 1]))
                                   .collect::<Vec<_>>()
                                   [..]);
        }
        for t in (0..n - k).combinations(n - k - 1) {
            solver.add_clause(&t.iter()
                                   .map(|&i| !d[k - 1][i])
                                   .chain(once(a[k - 1]))
                                   .collect::<Vec<_>>()
                                   [..]);
        }
    }
    (solver, a)
}

fn search(n: usize,
          solver: &mut Solver,
          a: &Vec<Lit>,
          assumptions: &mut Vec<Lit>,
          k: usize)
          -> usize {
    match solver.solve_with_assumptions(assumptions) {
        Lbool::True => search_sat(n, solver, a, assumptions, k),
        Lbool::False => 0,
        Lbool::Undef => panic!(),
    }
}

fn search_sat(n: usize,
              solver: &mut Solver,
              a: &Vec<Lit>,
              assumptions: &mut Vec<Lit>,
              k: usize)
              -> usize {
    if k >= n - 1 {
        1
    } else {
        let s = solver.is_true(a[k - 1]);
        assumptions.push(if s { a[k - 1] } else { !a[k - 1] });
        let c = search_sat(n, solver, a, assumptions, k + 1);
        assumptions.pop();
        assumptions.push(if s { !a[k - 1] } else { a[k - 1] });
        let c1 = search(n, solver, a, assumptions, k + 1);
        assumptions.pop();
        c + c1
    }
}

fn f(n: usize) -> usize {
    let (mut solver, proj) = make_solver(n);
    search(n, &mut solver, &proj, &mut vec![], 1)
}

fn main() {
    for n in 1.. {
        println!("{}: {}", n, f(n));
    }
}

Cargo.toml

[package]
name = "correlations-cms"
version = "0.1.0"
authors = ["Anders Kaseorg <andersk@mit.edu>"]

[dependencies]
cryptominisat = "5.0.1"
itertools = "0.6.0"

How it works

This does a recursive search through the tree of all partial assignments to prefixes of the Y/N array, using a SAT solver to check at each step whether the current partial assignment is consistent and prune the search if not. CryptoMiniSat is the right SAT solver for this job due to its special optimizations for XOR clauses.

The three families of constraints are

SiSk + iDki, for 1 ≤ kn − 2, 0 ≤ i ≤ nk;
Dki1Dki2 ∨ ¬Ak, for 1 ≤ kn − 2, 0 ≤ i1 < i2nk;
¬Dki1 ∨ ⋯ ∨ ¬Dkink − 1Ak, for 1 ≤ kn − 2, 0 ≤ i1 < ⋯ < ink − 1nk;

except that, as an optimization, S0 is forced to false, so that Dk0 is simply equal to Sk.

Anders Kaseorg

Posted 2017-06-03T19:19:07.807

Reputation: 29 242

2Woohoooooo ! :) – None – 2017-06-12T11:38:06.187

I am still trying to compile this in Windows (using cygwin + gcc). I cloned cryptominisat and compiled it. But I still don't know how to compile the rust code. When I do cargo build I get --- stderr CMake Error: Could not create named generator Visual Studio 14 2015 Win64 – None – 2017-06-12T14:17:32.013

Even though I can't run your code yet, I will award you the bounty as other people can. I may just give in and use a VM. – None – 2017-06-12T14:52:43.830

It would be great if you could explain fully how you can use a SAT solver to solve this problem. I think people would be very interested. – None – 2017-06-12T17:25:44.713

@Lembik I added a quick explanation of how I’m using SAT. The build process should be simply cargo build --release. There’s no need to separately clone and install CryptoMiniSat: the Rust bindings will ignore that and use a vendored copy anyway. – Anders Kaseorg – 2017-06-12T19:57:07.590

Thanks. The problem is that cargo build fails to compile it in cygwin. If there is some other way to get it to compile in Windows I will happily try it. – None – 2017-06-12T19:59:54.877

@Lembik Hmm, obviously I haven’t tried this on Windows, but a comment on a similar-looking issue for a different project suggests that you might need the Windows version of CMake instead of the Cygwin version? (Or maybe vice versa?)

– Anders Kaseorg – 2017-06-12T20:13:25.573

@Lembik Here are standalone rust installers for windows 32 bit(i686) and 64 bit(x86_64)

– rahnema1 – 2017-06-12T20:25:06.760

2@rahnema1 Thanks, but it sounds like the issue is with the CMake build system of the embedded C++ library in the cryptominisat crate, not with Rust itself. – Anders Kaseorg – 2017-06-12T21:26:32.713

Great job! I had completely different sets of clauses for the Y and N cases that I kept adding and removing in my attempt. No wonder it didn't work out. – Christian Sievers – 2017-06-12T21:54:04.570

@rahnema1 This is the problem https://bpaste.net/show/4f22dd76bb6d . I have cygwin + gcc + cygwin's cmake installed. If there is some other to compile this code in Windows I would be happy to try it.

– None – 2017-06-13T06:39:18.610

1@Lembik I'm getting a 404 from that paste. – Mego – 2017-06-14T09:17:57.253

@Mego I have posted a question to https://stackoverflow.com/questions/44520572/failed-to-run-custom-build-command-for-cryptominisat-on-windows which lists all my woes.

– None – 2017-06-14T09:19:42.940

I gave up on windows in the end (although @feersum managed to get it to compile with some effort). I still really like your answer. Am thinking of posing https://oeis.org/A005434 as fastest code as there doesn't seem to be an obvious closed form answer.. although I suspect it is much easier.

– None – 2017-06-15T17:43:45.990

I wonder if this solution needs the backtracking. Alternatively, just find a solution, count it, add a clause to forbid its combination of A literals, repeat. – Christian Sievers – 2017-06-16T09:50:37.210

1@ChristianSievers Good question. That works but it seems to be a bit slower (2× or so). I’m not sure why it shouldn’t be just as good, so maybe CryptoMiniSat just hasn’t been well optimized for that kind of incremental workload. – Anders Kaseorg – 2017-06-16T10:08:01.130

9

Rust, n ≈ 30 or 31 or 32

On my laptop (two cores, i5-6200U), this gets through n = 1, …, 31 in 53 seconds, using about 2.5 GiB of memory, or through n = 1, …, 32 in 105 seconds, using about 5 GiB of memory. Compile with cargo build --release and run target/release/correlations.

src/main.rs

extern crate rayon;

type S = u32;
const S_BITS: u32 = 32;

fn cat(mut a: Vec<S>, mut b: Vec<S>) -> Vec<S> {
    if a.capacity() >= b.capacity() {
        a.append(&mut b);
        a
    } else {
        b.append(&mut a);
        b
    }
}

fn search(n: u32, i: u32, ss: Vec<S>) -> u32 {
    if ss.is_empty() {
        0
    } else if 2 * i + 1 > n {
        search_end(n, i, ss)
    } else if 2 * i + 1 == n {
        search2(n, i, ss.into_iter().flat_map(|s| vec![s, s | 1 << i]))
    } else {
        search2(n,
                i,
                ss.into_iter()
                    .flat_map(|s| {
                                  vec![s,
                                       s | 1 << i,
                                       s | 1 << n - i - 1,
                                       s | 1 << i | 1 << n - i - 1]
                              }))
    }
}

fn search2<SS: Iterator<Item = S>>(n: u32, i: u32, ss: SS) -> u32 {
    let (shift, mask) = (n - i - 1, !(!(0 as S) << i + 1));
    let close = |s: S| {
        let x = (s ^ s >> shift) & mask;
        x & x.wrapping_sub(1) == 0
    };
    let (ssy, ssn) = ss.partition(|&s| close(s));
    let (cy, cn) = rayon::join(|| search(n, i + 1, ssy), || search(n, i + 1, ssn));
    cy + cn
}

fn search_end(n: u32, i: u32, ss: Vec<S>) -> u32 {
    if i >= n - 1 { 1 } else { search_end2(n, i, ss) }
}

fn search_end2(n: u32, i: u32, mut ss: Vec<S>) -> u32 {
    let (shift, mask) = (n - i - 1, !(!(0 as S) << i + 1));
    let close = |s: S| {
        let x = (s ^ s >> shift) & mask;
        x & x.wrapping_sub(1) == 0
    };
    match ss.iter().position(|&s| close(s)) {
        Some(0) => {
            match ss.iter().position(|&s| !close(s)) {
                Some(p) => {
                    let (ssy, ssn) = ss.drain(p..).partition(|&s| close(s));
                    let (cy, cn) = rayon::join(|| search_end(n, i + 1, cat(ss, ssy)),
                                               || search_end(n, i + 1, ssn));
                    cy + cn
                }
                None => search_end(n, i + 1, ss),
            }
        }
        Some(p) => {
            let (ssy, ssn) = ss.drain(p..).partition(|&s| close(s));
            let (cy, cn) = rayon::join(|| search_end(n, i + 1, ssy),
                                       || search_end(n, i + 1, cat(ss, ssn)));
            cy + cn
        }
        None => search_end(n, i + 1, ss),
    }
}

fn main() {
    for n in 1..S_BITS + 1 {
        println!("{}: {}", n, search(n, 1, vec![0, 1]));
    }
}

Cargo.toml

[package]
name = "correlations"
version = "0.1.0"
authors = ["Anders Kaseorg <andersk@mit.edu>"]

[dependencies]
rayon = "0.7.0"

Try it online!

I also have a slightly slower variant using very much less memory.

Anders Kaseorg

Posted 2017-06-03T19:19:07.807

Reputation: 29 242

What optimisations have you used? – None – 2017-06-05T05:46:03.217

1@Lembik The biggest optimization, besides doing everything with bitwise arithmetic in a compiled language, is to use only as much nondeterminism as needed to nail down a prefix of the Y/N array. I do a recursive search on possible prefixes of the Y/N array, taking along a vector of possible strings achieving that prefix, but only the strings whose unexamined middle is filled with zeros. That said, this is still an exponential search, and these optimizations only speed it up by polynomial factors. – Anders Kaseorg – 2017-06-05T06:15:54.727

It's a nice answer. Thank you. I am hoping someone will dig into the combinatorics to get a significant speed up. – None – 2017-06-05T13:34:01.910

@Lembik I’ve fixed a memory wasting bug, done more micro-optimization, and added parallelism. Please retest when you get a chance—I’m hoping to increase my score by 1 or 2. Do you have combinatorial ideas in mind for larger speedups? I haven’t come up with anything. – Anders Kaseorg – 2017-06-06T03:40:45.643

It doesn't run for me. I get 1: 1 2: 1 thread 'main' panicked at 'attempt to subtract with overflow', src\main.rs:40 – None – 2017-06-06T06:37:12.600

@Lembik Ah, that’s a bug, which I’ve now corrected. However, it also means that you’re compiling without optimization, since Rust only does overflow checking in debug mode. Please make sure you’re building with --release for benchmarking. – Anders Kaseorg – 2017-06-06T06:50:34.273

I was benchmarking with --release, never fear. Re: combinatorics... if we change the bound from 1 to 0 (that is look for exact matches) we get https://oeis.org/A005434 . This has a closed form formula. I was hoping this could be extended somehow.

– None – 2017-06-06T07:01:42.743

1@Lembik There is no formula given at the OEIS entry. (The Mathematica code there also seem to use brute-force.) If you know of one, you might want to tell them about it. – Christian Sievers – 2017-06-10T20:33:36.570

@ChristianSievers Hmmm... maybe I am wrong! – None – 2017-06-12T09:03:13.373

6

C++ using the BuDDy library

A different approach: have a binary formula (as binary decision diagram) that takes the bits of S as input and is true iff that gives some fixed values of Y or N at certain selected positions. If that formula is not constant false, select a free position and recurse, trying both Y and N. If there is no free position, we have found a possible output value. If the formula is constant false, backtrack.

This works relatively reasonable because there are so few possible values so that we can often backtrack early. I tried a similar idea with a SAT solver, but that was less successful.

#include<vector>
#include<iostream>
#include<bdd.h>

// does vars[0..i-1] differ from vars[n-i..n-1] in at least two positions?
bdd cond(int i, int n, const std::vector<bdd>& vars){
  bdd x1 { bddfalse };
  bdd xs { bddfalse };
  for(int k=0; k<i; ++k){
    bdd d { vars[k] ^ vars[n-i+k] };
    xs |= d & x1;
    x1 |= d;
  }
  return xs;
}

void expand(int i, int n, int &c, const std::vector<bdd>& conds, bdd x){
  if (x==bddfalse)
    return;
  if (i==n-2){
    ++c;
    return;
  }

  expand(i+1,n,c,conds, x & conds[2*i]);
  x &= conds[2*i+1];
  expand(i+1,n,c,conds, x);
}

int count(int n){
  if (n==1)   // handle trivial case
    return 1;
  bdd_setvarnum(n-1);
  std::vector<bdd> vars {};
  vars.push_back(bddtrue); // assume first bit is 1
  for(int i=0; i<n-1; ++i)
    if (i%2==0)            // vars in mixed order
      vars.push_back(bdd_ithvar(i/2));
    else
      vars.push_back(bdd_ithvar(n-2-i/2));
  std::vector<bdd> conds {};
  for(int i=n-1; i>1; --i){ // handle long blocks first
    bdd cnd { cond(i,n,vars) };
    conds.push_back( cnd );
    conds.push_back( !cnd );
  }
  int c=0;
  expand(0,n,c,conds,bddtrue);
  return c;
}

int main(void){
  bdd_init(20000000,1000000);
  bdd_gbc_hook(nullptr); // comment out to see GC messages
  for(int n=1; ; ++n){
    std::cout << n << " " << count(n) << "\n" ;
  }
}

To compile with debian 8 (jessie), install libbdd-dev and do g++ -std=c++11 -O3 -o hb hb.cpp -lbdd. It might be useful to increase the first argument to bdd_init even more.

Christian Sievers

Posted 2017-06-03T19:19:07.807

Reputation: 6 366

This looks interesting. What do you get to with this? – None – 2017-06-11T18:24:50.917

@Lembik I get 31 in 100s on very old hardware that won't let me answer faster – Christian Sievers – 2017-06-11T18:50:48.837

Any help you can give on how to compile this on Windows (e.g. using cygwin) gratefully received. – None – 2017-06-11T18:53:46.533

@Lembik I don't know about Windws but https://github.com/fd00/yacp/tree/master/buddy seems helpful w.r.t. cygwin

– Christian Sievers – 2017-06-11T19:25:39.780

I tried this on my i7 6700K, although this uses just a single thread it would run up-to 29, 30 and 31 in 9, 16 and 21 seconds! OTOH 35, 36 and 37 were completed in 1.5, 3.5 and 11.5 minutes. I guess if compilation is an issue you could run this inside a VM. – NikoNyrh – 2017-06-11T20:04:09.117

1Wow, okay, you’ve got me convinced that I need to add this library to my toolkit. Well done! – Anders Kaseorg – 2017-06-12T08:52:56.827

I like the idea of a SAT based solution. Which solver did you try? – None – 2017-06-12T09:52:07.177

@AndersKaseorg The BuDDy authors seem to have done a fine job, but I can't say anything about how it compares to other BDD libraries. It was just the first one that seemed to work for me. Maybe the important thing is just the BDD technique. – Christian Sievers – 2017-06-12T10:07:20.293

@Lembik I used picosat. – Christian Sievers – 2017-06-12T10:08:24.713

4

Clingo, n ≈ 30 or 31 34

I was a little surprised to see five lines of Clingo code overtake my brute-force Rust solution and come really close to Christian’s BuDDy solution—it looks like it would beat that too with a higher time limit.

corr.lp

{s(2..n)}.
d(K,I) :- K=1..n-2, I=1..n-K, s(I), not s(K+I).
d(K,I) :- K=1..n-2, I=1..n-K, not s(I), s(K+I).
a(K) :- K=1..n-2, {d(K,1..n-K)} 1.
#show a/1.

corr.sh

#!/bin/bash
for ((n=1;;n++)); do
    echo "$n $(clingo corr.lp --project -q -n 0 -c n=$n | sed -n 's/Models *: //p')"
done

plot

Anders Kaseorg

Posted 2017-06-03T19:19:07.807

Reputation: 29 242

This is great! It looks from your graph that the BuDDy solution suddenly gets worse. Any idea why? – None – 2017-06-12T08:53:55.187

@Lembik I haven’t investigated BuDDy enough to be sure, but maybe it runs out of cache at that point? – Anders Kaseorg – 2017-06-12T08:59:52.270

Wow! I think a higher first value to bdd_init might help, or allowing to increase the node table more by calling bdd_setmaxincrease with a value much above the default of 50000. - Are you using the changed version of my program? – Christian Sievers – 2017-06-12T09:51:41.893

@ChristianSievers I’m using your current version.

– Anders Kaseorg – 2017-06-12T10:00:09.577

2I do love your graph. – None – 2017-06-12T17:22:59.863

1You get a shocking performance boost by using the option --configuration=crafty (jumpy and trendy give similar results). – Christian Sievers – 2019-11-10T23:10:18.657

@ChristianSievers How much of a boost? – Anush – 2020-01-06T12:55:19.180

@Anush I think it was just enough to make me wonder which program will be the fastest. – Christian Sievers – 2020-01-07T01:55:18.967

2

Pari/GP, 23

By default, Pari/GP limits its stack size to 8 MB. The first line of the code, default(parisize, "4g"), sets this limit to 4 GB. If it still gives a stackoverflow, you can set it to 8 GB.

default(parisize, "4g")
f(n) = #vecsort([[2 > hammingweight(bitxor(s >> (n-i) , s % 2^i)) | i <- [2..n-1]] | s <- [0..2^(n-1)]], , 8)
for(n = 1, 100, print(n " -> " f(n)))

alephalpha

Posted 2017-06-03T19:19:07.807

Reputation: 23 988

Reaches 22 and then gives a stackoverflow. – None – 2017-06-04T17:29:08.393

Gets to 24 now. – None – 2017-06-05T08:39:25.887

2

Clojure, 29 in 75 38 seconds, 30 in 80 and 31 in 165

Runtimes from Intel i7 6700K, memory usage is less than 200 MB.

project.clj (uses com.climate/claypoole for multithreading):

(defproject tests "0.0.1-SNAPSHOT"
  :description "FIXME: write description"
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [com.climate/claypoole "1.1.4"]]
  :javac-options ["-target" "1.6" "-source" "1.6" "-Xlint:-options"]
  :aot [tests.core]
  :main tests.core)

Source code:

(ns tests.core
  (:require [com.climate.claypoole :as cp]
            [clojure.set])
  (:gen-class))

(defn f [N]
  (let [n-threads   (.. Runtime getRuntime availableProcessors)
        mask-offset (- 31 N)
        half-N      (quot N 2)
        mid-idx     (bit-shift-left 1 half-N)
        end-idx     (bit-shift-left 1 (dec N))
        lower-half  (bit-shift-right 0x7FFFFFFF mask-offset)
        step        (bit-shift-left 1 12)
        bitcount
          (fn [n]
            (loop [i 0 result 0]
              (if (= i N)
                result
                (recur
                  (inc i)
                  (-> n
                      (bit-xor (bit-shift-right n i))
                      (bit-and (bit-shift-right 0x7FFFFFFF (+ mask-offset i)))
                      Integer/bitCount
                      (< 2)
                      (if (+ result (bit-shift-left 1 i))
                          result))))))]
    (->>
      (cp/upfor n-threads [start (range 0 end-idx step)]
        (->> (for [i      (range start (min (+ start step) end-idx))
                   :when  (<= (Integer/bitCount (bit-shift-right i mid-idx))
                              (Integer/bitCount (bit-and         i lower-half)))]
               (bitcount i))
             (into #{})))
      (reduce clojure.set/union)
      count)))

(defn -main [n]
  (let [n-iters 5]
    (println "Calculating f(n) from 1 to" n "(inclusive)" n-iters "times")
    (doseq [i (range n-iters)]
      (->> n read-string inc (range 1) (map f) doall println time)))
  (shutdown-agents)
  (System/exit 0))

A brute-force solution, each thread goes over a subset of the range (2^12 items) and builds a set of integer values which indicate detected patterns. These are then "unioned" together and thus the distinct count is calculated. I hope the code isn't too tricky to follow eventhough it uses threading macros a lot. My main runs the test a few times to get JVM warmed up.

Update: Iterating over only half of the integers, gets the same result due to symmetry. Also skipping numbers with higher bit count on lower half of the number as they produce duplicates as well.

Pre-built uberjar (v1) (3.7 MB):

$ wget https://s3-eu-west-1.amazonaws.com/nikonyrh-public/misc/so-124424-v2.jar
$ java -jar so-124424-v2.jar 29
Calculating f(n) from 1 to 29 (inclusive) 5 times
(1 1 2 4 6 8 14 18 27 36 52 65 93 113 150 188 241 279 377 427 540 632 768 870 1082 1210 1455 1656 1974)
"Elapsed time: 41341.863703 msecs"
(1 1 2 4 6 8 14 18 27 36 52 65 93 113 150 188 241 279 377 427 540 632 768 870 1082 1210 1455 1656 1974)
"Elapsed time: 37752.118265 msecs"
(1 1 2 4 6 8 14 18 27 36 52 65 93 113 150 188 241 279 377 427 540 632 768 870 1082 1210 1455 1656 1974)
"Elapsed time: 38568.406528 msecs"
[ctrl+c]

Results on different hardwares, expected runtime is O(n * 2^n)?

i7-6700K  desktop: 1 to 29 in  38 seconds
i7-6820HQ laptop:  1 to 29 in  43 seconds
i5-3570K  desktop: 1 to 29 in 114 seconds

You can easily make this single-threaded and avoid that 3rd party dependency by using the standard for:

(for [start (range 0 end-idx step)]
  ... )

Well the built-in pmap also exists but claypoole has more features and tunability.

NikoNyrh

Posted 2017-06-03T19:19:07.807

Reputation: 2 361

Yeah, it makes it trivial to distribute. Would you have time to re-evaluate my solution, I'm quite sure you'd get it up-to 30 now. I do not have further optimizations in sight. – NikoNyrh – 2017-06-06T15:32:37.043

Sadly it's a no for 30. Elapsed time: 217150.87386 msecs – None – 2017-06-06T18:14:31.220

Ahaa, thanks for giving it a try :D It might have been better to fit a curve on this and interpolate that at which decimal value 120 seconds is spent but even as it is this is a nice challgenge. – NikoNyrh – 2017-06-06T18:29:31.353

1

Mathematica, n=19

press alt+. to abort and the result will be printed

k = 0;
For[n = 1, n < 1000, n++,
Z = Table[HammingDistance[#[[;; i]], #[[-i ;;]]], {i, Length@#}] & /@
Tuples[{0, 1}, n];
Table[If[Z[[i, j]] < 2, Z[[i, j]] = 0, Z[[i, j]] = 1], {i, 
Length@Z}, {j, n}];
k = Length@Union@Z]
Print["f(", n, ")=", k]

J42161217

Posted 2017-06-03T19:19:07.807

Reputation: 15 931

I can't run this so could you explain how it avoids taking exponential time? 2^241 is a very big number! – None – 2017-06-04T03:57:59.357

Can you show the output of the code? – None – 2017-06-04T04:17:10.890

1I meant f(n)... fixed – J42161217 – 2017-06-04T07:41:11.113

1

Mathematica, 21

f[n_] := Length@
     DeleteDuplicates@
      Transpose@
       Table[2 > Tr@IntegerDigits[#, 2] & /@ 
         BitXor[BitShiftRight[#, n - i], Mod[#, 2^i]], {i, 1, 
         n - 1}] &@Range[0, 2^(n - 1)];
Do[Print[n -> f@n], {n, Infinity}]

For comparison, Jenny_mathy's answer gives n = 19 on my computer.

The slowest part is Tr@IntegerDigits[#, 2] &. It is a shame that Mathematica doesn't have a built-in for Hamming weight.


If you want to test my code, you can download a free trial of Mathematica.

alephalpha

Posted 2017-06-03T19:19:07.807

Reputation: 23 988

1

Haskell, (unofficial n=20)

This is just the naive approach - so far without any optimizations. I wondered how well it would fare against other languages.

How to use it (assuming you have haskell platform installed):

  • Paste the code in one file approx_corr.hs (or any other name, modify following steps accordingly)
  • Navigate to the file and execute ghc approx_corr.hs
  • Run approx_corr.exe
  • Enter the maximal n
  • The result of each computation is displayed, as well as the cumulative real time (in ms) up to that point.

Code:

import Data.List
import Data.Time
import Data.Time.Clock.POSIX

num2bin :: Int -> Int -> [Int]
num2bin 0 _ = []
num2bin n k| k >= 2^(n-1) = 1 : num2bin (n-1)( k-2^(n-1))
           | otherwise  = 0: num2bin (n-1) k

genBinNum :: Int -> [[Int]]
genBinNum n = map (num2bin n) [0..2^n-1]

pairs :: [a] -> [([a],[a])]
pairs xs = zip (prefixes xs) (suffixes xs)
   where prefixes = tail . init . inits 
         suffixes = map reverse . prefixes . reverse 

hammingDist :: (Num b, Eq a) => ([a],[a]) -> b     
hammingDist (a,b) = sum $ zipWith (\u v -> if u /= v then 1 else 0) a b

f :: Int -> Int
f n = length $ nub $ map (map ((<=1).hammingDist) . pairs) $ genBinNum n
--f n = sum [1..n]

--time in milliseconds
getTime = getCurrentTime >>= pure . (1000*) . utcTimeToPOSIXSeconds >>= pure . round


main :: IO()
main = do 
    maxns <- getLine 
    let maxn = (read maxns)::Int
    t0 <- getTime 
    loop 1 maxn t0
     where loop n maxn t0|n==maxn = return ()
           loop n maxn t0
             = do 
                 putStrLn $ "fun eval: " ++ (show n) ++ ", " ++ (show $ (f n)) 
                 t <- getTime
                 putStrLn $ "time: " ++ show (t-t0); 
                 loop (n+1) maxn t0

flawr

Posted 2017-06-03T19:19:07.807

Reputation: 40 560

The code appears not to give output as it runs. This makes it a little hard to test. – None – 2017-06-12T08:48:53.997

Strange, does it compile without error? What happens if you try to compile the program main = putStrLn "Hello World!" ? – flawr – 2017-06-12T08:58:29.867

The Data.Bits module might be useful. For your main loop, you could use something like main = do maxn <- getmax; t0 <- gettime; loop 1 where loop n|n==maxn = return () and loop n = do printresult n (f n); t <- gettime; printtime (t-t0); loop (n+1). getmax could for example use getArgs to use the program arguments. – Christian Sievers – 2017-06-12T10:23:22.417

@ChristianSievers Thanks a lot!!! I asked this question at stackoverflow, I think it would be great if you could add that there too!

– flawr – 2017-06-12T11:52:36.533

I don't see how to answer there. You have a similar loop there already, and I didn't say anything about getting the time: that you already had here. – Christian Sievers – 2017-06-12T14:34:26.847

Anyway, you helped me a lot finding a solution, thank you very much=) – flawr – 2017-06-12T19:51:45.170

You're welcome. Since loop is local to main, it has access to main's bindings, so loop doesn't need maxn and and t0 as parameters. The "do some action or nothing according to some condition" idiom is so common that there are the functions when and unless for it in the Control.Monad module. The problem with the output can be fixed by copying the line about buffering from the other haskell solution. – Christian Sievers – 2017-06-12T22:15:56.037

1

A Haskell solution, using popCount and manually managed parallelism

Compile: ghc -rtsopts -threaded -O2 -fllvm -Wall foo.hs

(drop the -llvm if it doesn't work)

Run : ./foo +RTS -N

module Main (main) where

import Data.Bits
import Data.Word
import Data.List
import qualified Data.IntSet as S 
import System.IO
import Control.Monad
import Control.Concurrent
import Control.Exception.Base (evaluate)

pairs' :: Int -> Word64 -> Int
pairs' n s = fromIntegral $ foldl' (.|.) 0 $ map mk [1..n]
  where mk d = let mask = 1 `shiftL` d - 1 
                   pc = popCount $! xor (s `shiftR` (n - d)) (s .&. mask)
               in  if pc <= 1 
                     then mask + 1 
                     else 0 

mkSet :: Int -> Word64 -> Word64 -> S.IntSet
mkSet n a b = S.fromList $ map (pairs' n) [a .. b]

f :: Int -> IO Int
f n 
   | n < 4 = return $ S.size $ mkSet n 0 mxbound
   | otherwise = do
        mvs <- replicateM 4 newEmptyMVar
        forM_ (zip mvs cpairs) $ \(mv,(mi,ma)) -> forkIO $ do
          evaluate (mkSet n mi ma) >>= putMVar mv
        set <- foldl' S.union S.empty <$> mapM readMVar mvs
        return $! S.size set
   where
     mxbound = 1 `shiftL` (n - 1)
     bounds = [0,1 `shiftL` (n - 3) .. mxbound]
     cpairs = zip bounds (drop 1 bounds)

main :: IO()
main = do
    hSetBuffering stdout LineBuffering
    mapM_ (f >=> print) [1..]

bartavelle

Posted 2017-06-03T19:19:07.807

Reputation: 1 261

There is a buffering problem it seems in that I don't get any output at all if I run it from the cygwim command line. – None – 2017-06-12T11:18:23.437

I just updated my solution, but I don't know if it will help much. – bartavelle – 2017-06-12T11:22:25.403

@Lembik Unsure if that is obvious, but that should be compiled with -O3, and might be faster with -O3 -fllvm ... – bartavelle – 2017-06-12T11:26:42.593

(And all build files should be removed before recompiling, if not source code change happened) – bartavelle – 2017-06-12T11:28:56.660

@Lembik : I introduced parallelism. It should be a bit faster. – bartavelle – 2017-06-12T12:17:31.643

Still gets up to 1455. – None – 2017-06-12T13:14:13.533

@Lembik did you run it with +RTS -N and does it use all the cores? – bartavelle – 2017-06-12T13:19:54.060

It's a lot slower with +RTS -N – None – 2017-06-12T13:25:43.920

@Lembik, too bad, I suppose the runtime doesn't work so well under Cygwin :( thanks anyway for trying it! – bartavelle – 2017-06-12T13:26:59.030

@Lembik How many threads does it use? If 4, maybe it's better to use only 2 by using -N2 – Christian Sievers – 2017-06-12T13:40:00.067

1

A C version, using builtin popcount

Works better with clang -O3, but also works if you only have gcc.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned long pairs(unsigned int n, unsigned long s) { 
  unsigned long result = 0;

  for(int d=1;d<=n;d++) { 
    unsigned long mx = 1 << d;
    unsigned long mask = mx - 1;

    unsigned long diff = (s >> (n - d)) ^ (s & mask);
    if (__builtin_popcountl(diff) <= 1)
      result |= mx;
  } 
  return result;

}

unsigned long f(unsigned long  n) { 
  unsigned long max = 1 << (n - 1);
#define BLEN (max / 2)
  unsigned char * buf = malloc(BLEN);
  memset(buf, 0, BLEN);
  unsigned long long * bufll = (void *) buf;

  for(unsigned long i=0;i<=max;i++) { 
    unsigned int r = pairs(n, i);
    buf[r / 8] |= 1 << (r % 8);
  } 

  unsigned long result = 0;

  for(unsigned long i=0;i<= max / 2 / sizeof(unsigned long long); i++) { 
    result += __builtin_popcountll(bufll[i]);
  } 

  free(buf);

  return result;
}

int main(int argc, char ** argv) { 
  unsigned int n = 1;

  while(1) { 
    printf("%d %ld\n", n, f(n));
    n++;
  } 
  return 0;
}

bartavelle

Posted 2017-06-03T19:19:07.807

Reputation: 1 261

It gets to 24 very quickly and then ends. You need to increase the limit. – None – 2017-06-12T13:16:07.813

Oh god, I forgot to remove the benchmark code! I'll remove the two offending lines :/ – bartavelle – 2017-06-12T13:18:31.327

@Lembik should be fixed now – bartavelle – 2017-06-12T13:20:57.657

0

Python 2 + pypy, n = 22

Here is a really simple Python solution as a sort of baseline benchmark.

import itertools
def hamming(A, B):
    n = len(A)
    assert(len(B) == n)
    return n-sum([A[i] == B[i] for i in xrange(n)])

def prefsufflist(P):
    n = len(P)
    return [hamming(P[:i], P[n-i:n]) for i in xrange(1,n+1)]

bound = 1
for n in xrange(1,25):
    booleans = set()
    for P in itertools.product([0,1], repeat = n):
        booleans.add(tuple(int(HD <= bound) for HD in prefsufflist(P)))
    print "n = ", n, len(booleans)

user9206

Posted 2017-06-03T19:19:07.807

Reputation: