Delete some bits and count

26

6

Consider all 2^n different binary strings of length n and assume n > 2. You are allowed to delete exactly b < n/2 bits from each of the binary strings, leaving strings of length n-b remaining. The number of distinct strings remaining depends on which bits you delete. Assuming your aim is to leave as few remaining different strings as possible, this challenge is to write code to compute how few can you leave as a function of n.

Example, n=3 and b = 1. You can leave only the two strings 11 and 00.

For n=9 and b = 1,2,3,4 we have 70,18,6,2

For n=8 and b = 1,2,3 we have 40,10,4

For n=7 and b = 1,2,3 we have 20,6,2

For n=6 and b = 1,2 we have 12,4

For n=5 and b = 1,2 we have 6,2

This question was originally posed by me in 2014 in a different form on MO.

Input and output

Your code should take in an integern and output a single integer for each value of b starting at b = 0 and increasing.

Score

Your score is the largest n for which your code completes for all b < n/2 in under a minute on my Linux based PC. In case of tie breaks, the largest b your code gets to for the joint largest n wins. In case of tie breaks on that criterion too, the fastest code for the largest values of n and b decides. If the times are within a second or two of each other, the first posted answer wins.

Languages and libraries

You can use any language of library you like. Because I have to run your code, it would help if it was free (as in beer) and worked in Linux.

Anush

Posted 2018-05-28T08:12:21.583

Reputation: 3 202

I'm assuming b > 0 as additional input-requirement? Or would n=3 and b=0 simply output 2^n as result? – Kevin Cruijssen – 2018-05-28T09:12:00.117

@KevinCruijssen It should output 2^n indeed. – Anush – 2018-05-28T09:13:10.710

Also, you say the input is a single n and a single b, but the score is the largest n for which the code completes all b < n/2 in under a minute. Wouldn't it be better to have a single input n in that case, and output all results for 0 <= b < n/2? Or should we provide two programs/functions: one taking two inputs n and b, and one taking only input n and outputting all results in the range 0 <= b < n/2? – Kevin Cruijssen – 2018-05-28T09:18:08.367

@KevinCruijssen Thanks. I have applied your fix. – Anush – 2018-05-28T09:43:56.243

2Well, I had already upvoted your challenge, so can't do it again. :) Although I have no idea how to calculate this efficiently (efficient O algorithms were something I've always been bad at.. and one of the few subjects at IT college I had to redo a couple of times), it does seem like a very interesting challenge. I'm curious to see what answers people come up with. – Kevin Cruijssen – 2018-05-28T09:47:27.810

@KevinCruijssen Brute force. With some smart optimizations. – user202729 – 2018-05-28T10:05:43.957

2Is there a working example? It would be a good place to start, both in terms of correctness, but also for comparison of speed. – maxb – 2018-05-28T10:39:31.793

For each b the first few n seem always [2,4,6,10]? – l4m2 – 2018-05-28T10:43:44.560

but there are so many lists that match [2,4,6,10,18] on OEIS – l4m2 – 2018-05-28T10:53:12.630

@maxb Hopefully the examples in the question cover correctness. I don't have sample code, sorry. – Anush – 2018-05-28T14:52:21.447

I had to read over this several times. Am I correct to assume that for n = 4 and b = 1 that the answer would be 4? (000, 111, 011, and 100) – Centijo – 2018-05-30T20:13:02.477

@Centijo Yes that works. – Anush – 2018-05-30T20:38:34.127

"the first posted answer wins" Shouldn't the tie breaker be fastest completion of the highest completed? For example, if two answers both can complete up to n=12;b=4 and timeout on n=12;b=5, shouldn't the win go to the code that completes n=12;b=4 the fastest? – mypetlion – 2018-05-31T17:26:50.147

@mypetlion You are right. I have changed it now. – Anush – 2018-05-31T19:06:27.737

Out of curiosity: what's the motivation for considering such a problem? You mention that you asked about it some time ago on MO. Does this problem have some applications in practice or theory? It looks really interesting. – Radek – 2018-06-12T17:57:17.840

@Radek Just recreational math I am afraid. – Anush – 2018-06-12T18:00:36.143

How many cores does your PC have? In other words: is it worth considering a concurrent solution? – Udo Borkowski – 2018-06-14T10:24:53.017

@UdoBorkowski Yes! I has 4 real cores. – Anush – 2018-06-14T10:46:25.160

Answers

6

Python 2.7 / Gurobi n=9

This solution is very straight usage of Gurobi's ILP solver for the boolean Mixed-Integer Problems (MIP).

The only trick is to take out symmetry in 1's complement to halve the problem sizes.

Using Gurobi LLC's limited time "free" licence we are restricted to 2000 constraints, but solving 10 del 1 is well outside the 60 second time-limit anyway on my laptop.

from gurobipy import *
from itertools import combinations

def mincover(n,d):
    bs = pow(2,n-1-d)
    m = Model()
    m.Params.outputFlag = 0
    b = {}
    for i in range(bs):
      b[i] = m.addVar(vtype=GRB.BINARY, name="b%d" % i)
    m.update()
    for row in range(pow(2,n-1)):
      x = {}
      for i in combinations(range(n), n-d):
        v = 0
        for j in range(n-d):
          if row & pow(2,i[j]):
            v += pow(2,j)
        if v >= bs:
          v = 2*bs-1-v
        x[v] = 1
      m.addConstr(quicksum(b[i] for i in x.keys()) >= 1)
    m.setObjective(quicksum(b[i] for i in range(bs) ), GRB.MINIMIZE)
    m.optimize()
    return int(round(2*m.objVal,0))

for n in range(4,10):
    for d in range((n//2)+1):
        print n, d, mincover(n,d)

UPDATE+CORR: 10,2 has optimal solution size 31 (see e.g.) Gurobi shows no symmetric solution of size 30 exists (returns problem infeasible) .. [my attempt to show asymmetric feasibility at 30 remained inconclusive after 9.5hrs runtime] e.g. bit patterns of integers 0 7 13 14 25 28 35 36 49 56 63 64 95 106 118 128 147 159 170 182 195 196 200 207 225 231 240 243 249 252 255 or 0 7 13 14 19 25 28 35 36 49 56 63 64 95 106 118 128 159 170 182 195 196 200 207 225 231 240 243 249 252 255

jayprich

Posted 2018-05-28T08:12:21.583

Reputation: 391

You broke the "fastest claimed infinite bounty" record? – user202729 – 2018-06-10T01:15:05.373

I don't see any bounty here, what do you mean? – jayprich – 2018-06-10T07:16:06.197

@user202729 Yes.. I set it too low. I should have set it at n = 10 :) – Anush – 2018-06-10T08:20:33.460

Actually solving it at n=9 is not an easy thing. That's why OP use an existing library (which is supposed to be better than a hand-written solution, like mine). – user202729 – 2018-06-10T10:42:47.313

Does your remark about symmetry mean that you assume that (at least) some optimal solution has a set of remaining binary strings that is closed under bitwise complement? That seems plausible, but not obvious, and the question on MO claims that some (this?) symmetry does not hold. – Christian Sievers – 2018-06-12T22:56:20.793

1Thanks @ChristianSievers I see MO claim that 10,2 has only asymmetric optima which I cannot refute nor verify. If I remove the symmetry assumption shortcut which works up to n=9 it turns out Gurobi can still solve up to n=9 in the time required. – jayprich – 2018-06-13T17:23:18.453

@Christian - I have now confirmed 10,2 has optimum size of 31 – jayprich – 2018-06-15T00:24:37.160

Why are you sure that no asymmetric solution of size 30 can exist? And shouldn't the all-1 pattern (255) appear among the numbers? – Christian Sievers – 2018-06-15T00:48:50.003

Your skepticism is entirely justified .. working on fixing both those errors. – jayprich – 2018-06-15T14:26:22.743

I updated the bounty by the way. – Anush – 2018-09-11T11:06:39.997

3

C++, n=6

Brute force with some small optimizations.

#include<cassert>
#include<iostream>
#include<vector>

// ===========
/** Helper struct to print binary representation.
`std::cout<<bin(str,len)` prints (str:len) == the bitstring 
represented by last (len) bits of (str).
*/
struct bin{
    int str,len;
    bin(int str,int len):str(str),len(len){}
};
std::ostream& operator<<(std::ostream& str,bin a){
    if(a.len)
        return str<<bin(a.str>>1,a.len-1)<<char('0'+(a.str&1));
    else if(a.str)
        return str<<"...";
    else
        return str;
}
// ===========

/// A patten of (len) bits of ones.
int constexpr pat1(int len){
    return (1<<len)-1;
}

// TODO benchmark: make (res) global variable?

/**Append all distinct (subseqs+(sfx:sfxlen)) of (str:len) 
with length (sublen) to (res).
*/
void subseqs_(
    int str,int len,int sublen,
    int sfx,int sfxlen,
    std::vector<int>& res
){
    // std::cout<<"subseqs_ : str = "<<bin(str,len)<<", "
    // "sublen = "<<sublen<<", sfx = "<<bin(sfx,sfxlen)<<'\n';

    assert(len>=0);

    if(sublen==0){ // todo remove some branches can improve perf?
        res.push_back(sfx);
        return;
    }else if(sublen==len){
        res.push_back(str<<sfxlen|sfx);
        return;
    }else if(sublen>len){
        return;
    }

    if(str==0){
        res.push_back(sfx);
        return;
    }

    int nTrail0=0;
    for(int ncut;str&&nTrail0<sublen;

        ++nTrail0,
        ncut=__builtin_ctz(~str)+1, // cut away a bit'0' of str
        // plus some '1' bits
        str>>=ncut,
        len-=ncut
    ){
        ncut=__builtin_ctz(str)+1; // cut away a bit'1' of str
        subseqs_(str>>ncut,len-ncut,sublen-nTrail0-1,
            sfx|1<<(sfxlen+nTrail0),sfxlen+nTrail0+1,
            res
        ); // (sublen+sfxlen) is const. TODO global var?
    }

    if(nTrail0+len>=sublen) // this cannot happen if len<0
        res.push_back(sfx);
}

std::vector<int> subseqs(int str,int len,int sublen){
    assert(sublen<=len);
    std::vector<int> res;
    if(__builtin_popcount(str)*2>len){ // too many '1's, flip [todo benchmark]
        subseqs_(pat1(len)^str,len,sublen,0,0,res);
        int const p1sublen=pat1(sublen);
        for(int& r:res)r^=p1sublen;
    }else{
        subseqs_(str,len,sublen,0,0,res);
    }
    return res;
}

// ==========

/** Append all distinct (supersequences+(sfx:sfxlen)) of (str:len)
with length (suplen) to (res).
Define (a) to be a "supersequence" of (b) iff (b) is a subsequence of (a).
*/
void supseqs_(
    int str,int len,int suplen,
    int sfx,int sfxlen,
    std::vector<int>& res
){
    assert(suplen>=len);

    if(suplen==0){
        res.push_back(sfx);
        return;
    }else if(suplen==len){
        res.push_back(str<<sfxlen|sfx);
        return;
    }

    int nTrail0; // of (str)
    if(str==0){
        res.push_back(sfx);
        // it's possible that the supersequence is '0000..00'
        nTrail0=len;
    }else{
        // str != 0 -> str contains a '1' bit ->
        // supersequence cannot be '0000..00'
        nTrail0=__builtin_ctz(str);
    }
    // todo try `nTrail0=__builtin_ctz(str|1<<len)`, eliminates a branch
    // and conditional statement

    for(int nsupTrail0=0;nsupTrail0<nTrail0;++nsupTrail0){
        // (nsupTrail0+1) last bits of supersequence matches with 
        // nsupTrail0 last bits of str.
        supseqs_(str>>nsupTrail0,len-nsupTrail0,suplen-1-nsupTrail0,
            sfx|1<<(nsupTrail0+sfxlen),sfxlen+nsupTrail0+1,
            res);
    }

    int const strMatch=str?nTrail0+1:len; 
    // either '1000..00' or (in case str is '0000..00') the whole (str)

    for(int nsupTrail0=suplen+strMatch-len;nsupTrail0-->nTrail0;){
        // because (len-strMatch)<=(suplen-1-nsupTrail0),
        // (nsupTrail0<suplen+strMatch-len).

        // (nsupTrail0+1) last bits of supersequence matches with
        // (strMatch) last bits of str.
        supseqs_(str>>strMatch,len-strMatch,suplen-1-nsupTrail0,
            sfx|1<<(nsupTrail0+sfxlen),sfxlen+nsupTrail0+1,
            res);
    }

    // todo try pulling constants out of loops
}

// ==========

int n,b;
std::vector<char> done;
unsigned min_undone=0;

int result;
void backtrack(int nchoice){
    assert(!done[min_undone]);
    ++nchoice;
    std::vector<int> supers_s;
    for(int s:subseqs(min_undone,n,n-b)){
        // obviously (s) is not chosen. Try choosing (s)
        supers_s.clear();
        supseqs_(s,n-b,n,0,0,supers_s);
        for(unsigned i=0;i<supers_s.size();){
            int& x=supers_s[i];
            if(!done[x]){
                done[x]=true;
                ++i;
            }else{
                x=supers_s.back();
                supers_s.pop_back();
            }
        }

        unsigned old_min_undone=min_undone;
        while(true){
            if(min_undone==done.size()){
                // found !!!!
                result=std::min(result,nchoice);
                goto label1;
            }
            if(not done[min_undone])
                break;
            ++min_undone;
        }
        if(nchoice==result){
            // backtrack more will only give worse result
            goto label1;
        }

        // note that nchoice is already incremented
        backtrack(nchoice);

        label1: // undoes the effect of (above)
        for(int x:supers_s)
            done[x]=false;
        min_undone=old_min_undone;
    }
}

int main(){
    std::cin>>n>>b;

    done.resize(1<<n,0);
    result=1<<(n-b); // the actual result must be less than that

    backtrack(0);
    std::cout<<result<<'\n';
}

Run locally:

[user202729@archlinux golf]$ g++ -std=c++17 -O2 delbits.cpp -o delbits
[user202729@archlinux golf]$ time for i in $(seq 1 3); do ./delbits <<< "6 $i"; done
12
4
2

real    0m0.567s
user    0m0.562s
sys     0m0.003s
[user202729@archlinux golf]$ time ./delbits <<< '7 1'
^C

real    4m7.928s
user    4m7.388s
sys     0m0.173s
[user202729@archlinux golf]$ time for i in $(seq 2 3); do ./delbits <<< "7 $i"; done
6
2

real    0m0.040s
user    0m0.031s
sys     0m0.009s

user202729

Posted 2018-05-28T08:12:21.583

Reputation: 14 620

1Mostly to encourage others to post their code if it's faster than mine. – user202729 – 2018-05-31T13:56:30.853

Please?... (note: This is an instance of a set cover problem.) – user202729 – 2018-06-01T23:43:22.583

1I'm working on it. I just can't come up with any smart way of doing it. If nobody else posts an answer, I'll put mine up that can only go as high as n=4 so far. – mypetlion – 2018-06-01T23:58:18.783