Find the largest independent set in a high-dimensional lattice-like graph

16

2

For a given positive integer n, consider all binary strings of length 2n-1. For a given string S, let L be an array of length n which contains the count of the number of 1s in each substring of length n of S. For example, if n=3 and S = 01010 then L=[1,2,1]. We call L the counting array of S.

We say that two strings S1 and S2 of the same length match if their respective counting arrays L1 and L2 have the property that L1[i] <= 2*L2[i] and L2[i] <= 2*L1[i] for all i.

Task

For increasing n starting at n=1, the task is to find the size of the largest set of strings, each of length 2n-1 so that no two strings match.

Your code should output one number per value of n.

Score

Your score is the highest n for which no one else has posted a higher correct answer for any of your answers. Clearly if you have all optimum answers then you will get the score for the highest n you post. However, even if your answer is not the optimum, you can still get the score if no one else can beat it.

Example answers

For n=1,2,3,4 I get 2,4,10,16.

Languages and libraries

You can use any available language and libraries you like. Where feasible, it would be good to be able to run your code so please include a full explanation for how to run/compile your code in linux if at all possible.

Leading entries

  • 5 by Martin Büttner in Mathematica
  • 6 by Reto Koradi in C++. Values are 2, 4, 10, 16, 31, 47, 75, 111, 164, 232, 328, 445, 606, 814, 1086. The first 5 are known to be optimal.
  • 7 by Peter Taylor in Java. Values are 2, 4, 10, 16, 31, 47, 76, 111, 166, 235.
  • 9 by joriki in Java. Values are 2, 4, 10, 16, 31, 47, 76, 112, 168.

user9206

Posted 2015-08-07T07:59:09.067

Reputation:

3I think it's more natural to understand the inequality when notated as L1[i]/2 <= L2[i] <= 2*L1[i]. – orlp – 2015-08-07T09:58:46.987

1Also, matching is not an equivalence relationship. match(A, B) and match(B, C) does not imply match(A, C) (same for the inverse). Example: [1] and [2] match, [2] and [3] match, but [1] and [3] do not. Similarly, [1,3] and [3,1] do not match, [3, 1] and [2, 3] do not match, but [1, 3] and [2, 3] match. – orlp – 2015-08-07T10:15:10.883

Answers

7

2, 4, 10, 16, 31, 47, 76, 112, 168

For each n, this Java code determines the possible counting arrays and then finds non-matching sets of increasing size, for each size starting with a random set and improving it by randomized steepest descent. In each step, one of the elements of the set is randomly uniformly selected and replaced by another counting array randomly uniformly selected among the ones not in use. The step is accepted if it does not increase the number of matches. This latter prescription appears to be crucial; accepting steps only if they reduce the number of matches isn't nearly as effective. The steps leaving the number of matches invariant allow the search space to be explored, and eventually some space may open up to avoid one of the matches. After 2^24 steps without improvement, the previous size is output for the present value of n, and n is incremented.

The results up to n = 9 are 2, 4, 10, 16, 31, 47, 76, 112, 168, improving on the previous results for n = 8 by one and for n = 9 by two. For higher values of n, the limit of 2^24 steps may have to be increased.

I also tried simulated annealing (with the number of matches as the energy), but without improvement over steepest descent.

Code

save as Question54354.java
compile with javac Question54354.java
run with java Question54354

import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;

public class Question54354 {
    static class Array {
        int [] arr;

        public Array (int [] arr) {
            this.arr = arr;
        }

        public int hashCode () {
            return Arrays.hashCode (arr);
        }

        public boolean equals (Object o) {
            return Arrays.equals (((Array) o).arr,arr);
        }
    }

    static int [] indices;
    static int [] [] counts;
    static boolean [] used;

    static Random random = new Random (0);

    static boolean match (int [] c1,int [] c2) {
        for (int k = 0;k < c1.length;k++)
            if (c1 [k] > 2 * c2 [k] || c2 [k] > 2 * c1 [k])
                return false;
        return true;
    }

    static int matches (int i) {
        int result = 0;
        for (int j = 0;j < indices.length;j++)
            if (j != i && match (counts [indices [i]],counts [indices [j]]))
                result++;
        return result;
    }

    static void randomize (int i) {
        do
            indices [i] = random.nextInt (counts.length);
        while (used [indices [i]]);
    }

    public static void main (String [] args) {
        for (int n = 1,length = 1;;n++,length += 2) {
            int [] lookup = new int [1 << n];
            for (int string = 0;string < 1 << n;string++)
                for (int bit = 1;bit < 1 << n;bit <<= 1)
                    if ((string & bit) != 0)
                        lookup [string]++;
            Set<Array> arrays = new HashSet<Array> ();
            for (int string = 0;string < 1 << length;string++) {
                int [] count = new int [n];
                for (int i = 0;i < n;i++)
                    count [i] = lookup [(string >> i) & ((1 << n) - 1)];
                arrays.add (new Array (count));
            }
            counts = new int [arrays.size ()] [];
            int j = 0;
            for (Array array : arrays)
                counts [j++] = array.arr;
            used = new boolean [counts.length];

            int m;
            outer:
            for (m = 1;m <= counts.length;m++) {
                indices = new int [m];
                for (;;) {
                    Arrays.fill (used,false);
                    for (int i = 0;i < m;i++) {
                        randomize (i);
                        used [indices [i]] = true;
                    }
                    int matches = 0;
                    for (int i = 0;i < m;i++)
                        matches += matches (i);
                    matches /= 2;
                    int stagnation = 0;
                    while (matches != 0) {
                        int k = random.nextInt (m);
                        int oldMatches = matches (k);
                        int oldIndex = indices [k];
                        randomize (k);
                        int newMatches = matches (k);
                        if (newMatches <= oldMatches) {
                            if (newMatches < oldMatches) {
                                matches += newMatches - oldMatches;
                                stagnation = 0;
                            }
                            used [oldIndex] = false;
                            used [indices [k]] = true;
                        }
                        else
                            indices [k] = oldIndex;

                        if (++stagnation == 0x1000000)
                            break outer;
                    }
                    break;
                }
            }
            System.out.println (n + " : " + (m - 1));
        }
    }
}

joriki

Posted 2015-08-07T07:59:09.067

Reputation: 186

1A very nice improvement! – None – 2015-08-13T10:32:28.663

11

2, 4, 10, 16, 31, 47, 76, 111, 166, 235

Notes

If we consider a graph G with vertices 0 to n and edges joining two numbers which match, then the tensor power G^n has vertices (x_0, ..., x_{n-1}) forming the Cartesian power {0, ..., n}^n and edges between matching tuples. The graph of interest is the subgraph of G^n induced by those vertices which correspond to the possible "counting arrays".

So the first subtask is to generate those vertices. The naïve approach enumerates over 2^{2n-1} strings, or on the order of 4^n. But if we instead look at the array of first differences of the counting arrays we find that there are only 3^n possibilities, and from the first differences we can deduce the range of possible initial values by requiring that no element in the zeroth differences is less than 0 or greater than n.

We then want to find the maximum independent set. I'm using one theorem and two heuristics:

  • Theorem: the maximum independent set of a disjoint union of graphs is the union of their maximum independent sets. So if we break a graph down into unconnected components we can simplify the problem.
  • Heuristic: assume that (n, n, ..., n) will be in a maximum independent set. There's a fairly large clique of vertices {m, m+1, ..., n}^n where m is the smallest integer which matches n; (n, n, ..., n) is guaranteed to have no matches outside that clique.
  • Heuristic: take the greedy approach of selecting the vertex of lowest degree.

On my computer this finds 111 for n=8 in 16 seconds, 166 for n=9 in about 8 minutes, and 235 for n=10 in about 2 hours.

Code

Save as PPCG54354.java, compile as javac PPCG54354.java, and run as java PPCG54354.

import java.util.*;

public class PPCG54354 {
    public static void main(String[] args) {
        for (int n = 1; n < 20; n++) {
            long start = System.nanoTime();

            Set<Vertex> constructive = new HashSet<Vertex>();
            for (int i = 0; i < (int)Math.pow(3, n-1); i++) {
                int min = 0, max = 1, diffs[] = new int[n-1];
                for (int j = i, k = 0; k < n-1; j /= 3, k++) {
                    int delta = (j % 3) - 1;
                    if (delta == -1) min++;
                    if (delta != 1) max++;
                    diffs[k] = delta;
                }

                for (; min <= max; min++) constructive.add(new Vertex(min, diffs));
            }

            // Heuristic: favour (n, n, ..., n)
            Vertex max = new Vertex(n, new int[n-1]);
            Iterator<Vertex> it = constructive.iterator();
            while (it.hasNext()) {
                Vertex v = it.next();
                if (v.matches(max) && !v.equals(max)) it.remove();
            }

            Set<Vertex> ind = independentSet(constructive, n);
            System.out.println(ind.size() + " after " + ((System.nanoTime() - start) / 1000000000L) + " secs");
        }
    }

    private static Set<Vertex> independentSet(Set<Vertex> vertices, int dim) {
        if (vertices.size() < 2) return vertices;

        for (int idx = 0; idx < dim; idx++) {
            Set<Set<Vertex>> p = connectedComponents(vertices, idx);
            if (p.size() > 1) {
                Set<Vertex> ind = new HashSet<Vertex>();
                for (Set<Vertex> part : connectedComponents(vertices, idx)) {
                    ind.addAll(independentSet(part, dim));
                }
                return ind;
            }
        }

        // Greedy
        int minMatches = Integer.MAX_VALUE;
        Vertex minV = null;
        for (Vertex v0 : vertices) {
            int numMatches = 0;
            for (Vertex vi : vertices) if (v0.matches(vi)) numMatches++;
            if (numMatches < minMatches) {
                minMatches = numMatches;
                minV = v0;
            }
        }

        Set<Vertex> nonmatch = new HashSet<Vertex>();
        for (Vertex vi : vertices) if (!minV.matches(vi)) nonmatch.add(vi);
        Set<Vertex> ind = independentSet(nonmatch, dim);
        ind.add(minV);
        return ind;
    }

    // Separates out a set of vertices which form connected components when projected into the idx axis.
    private static Set<Set<Vertex>> connectedComponents(Set<Vertex> vertices, final int idx) {
        List<Vertex> sorted = new ArrayList<Vertex>(vertices);
        Collections.sort(sorted, new Comparator<Vertex>() {
                public int compare(Vertex a, Vertex b) {
                    return a.x[idx] - b.x[idx];
                }
            });

        Set<Set<Vertex>> connectedComponents = new HashSet<Set<Vertex>>();
        Set<Vertex> current = new HashSet<Vertex>();
        int currentVal = 0;
        for (Vertex v : sorted) {
            if (!match(currentVal, v.x[idx]) && !current.isEmpty()) {
                connectedComponents.add(current);
                current = new HashSet<Vertex>();
            }

            current.add(v);
            currentVal = v.x[idx];
        }

        if (!current.isEmpty()) connectedComponents.add(current);
        return connectedComponents;
    }

    private static boolean match(int a, int b) {
        return a <= 2 * b && b <= 2 * a;
    }

    private static class Vertex {
        final int[] x;
        private final int h;

        Vertex(int[] x) {
            this.x = x.clone();

            int _h = 0;
            for (int xi : x) _h = _h * 37 + xi;
            h = _h;
        }

        Vertex(int x0, int[] diffs) {
            x = new int[diffs.length + 1];
            x[0] = x0;
            for (int i = 0; i < diffs.length; i++) x[i+1] = x[i] + diffs[i];

            int _h = 0;
            for (int xi : x) _h = _h * 37 + xi;
            h = _h;
        }

        public boolean matches(Vertex v) {
            if (v == this) return true;
            if (x.length != v.x.length) throw new IllegalArgumentException("v");
            for (int i = 0; i < x.length; i++) {
                if (!match(x[i], v.x[i])) return false;
            }
            return true;
        }

        @Override
        public int hashCode() {
            return h;
        }

        @Override
        public boolean equals(Object obj) {
            return (obj instanceof Vertex) && equals((Vertex)obj);
        }

        public boolean equals(Vertex v) {
            if (v == this) return true;
            if (x.length != v.x.length) return false;
            for (int i = 0; i < x.length; i++) {
                if (x[i] != v.x[i]) return false;
            }
            return true;
        }

        @Override
        public String toString() {
            if (x.length == 0) return "e";

            StringBuilder sb = new StringBuilder(x.length);
            for (int xi : x) sb.append(xi < 10 ? (char)('0' + xi) : (char)('A' + xi - 10));
            return sb.toString();
        }
    }
}

Peter Taylor

Posted 2015-08-07T07:59:09.067

Reputation: 41 901

10

Mathematica, n = 5, 31 strings

I just wrote a brute force solution using Mathematica's built-ins to verify Lembik's example answers, but it can handle n = 5 as well:

n = 5;
s = Tuples[{0, 1}, 2 n - 1];
l = Total /@ Partition[#, n, 1] & /@ s
g = Graph[l, 
  Cases[Join @@ Outer[UndirectedEdge, l, l, 1], 
   a_ <-> b_ /; 
    a != b && And @@ Thread[a <= 2 b] && And @@ Thread[b <= 2 a]]]
set = First@FindIndependentVertexSet[g]
Length@set

As a bonus, this code produces a visualisation of the problem as a graph where each edge indicates two matching strings.

Here is the graph for n = 3:

enter image description here

Martin Ender

Posted 2015-08-07T07:59:09.067

Reputation: 184 808

2At first I thought the graph was nicely symmetric, but then I saw the slightly offset point on the left. Can not unsee :( – orlp – 2015-08-07T10:24:44.750

3

C++ (heuristic): 2, 4, 10, 16, 31, 47, 75, 111, 164, 232, 328, 445, 606, 814, 1086

This is slightly behind Peter Taylor's result, being 1 to 3 lower for n=7, 9 and 10. The advantage is that it's much faster, so I can run it for higher values of n. And it can be understood without any fancy math. ;)

The current code is dimensioned to run up to n=15. Run times increase roughly by a factor of 4 for each increase in n. For example, it was 0.008 second for n=7, 0.07 seconds for n=9, 1.34 seconds for n=11, 27 seconds for n=13, and 9 minutes for n=15.

There are two key observations I used:

  • Instead of operating on the values themselves, the heuristic operates on counting arrays. To do this, a list of all unique counting arrays is generated first.
  • Using counting arrays with small values is more beneficial, since they eliminate less of the solution space. This is based on each count c excluding the range of c / 2 to 2 * c from other values. For smaller values of c, this range is smaller, which means that fewer values are excluded by using it.

Generate Unique Counting Arrays

I went brute force on this one, iterating through all values, generating the count array for each of them, and uniquifying the resulting list. This could certainly be done more efficiently, but it is good enough for the kinds of values we're operating with.

This is extremely quick for the small values. For the larger values, the overhead does become substantial. For example, for n=15, it uses about 75% of the whole runtime. This would definitely be an area to look at when trying to go much higher than n=15. Even just the memory usage for building a list of the counting arrays for all values would start to become problematic.

The number of unique counting arrays is about 6% of the number of values for n=15. This relative count becomes smaller as n becomes larger.

Greedy Selection of Counting Array Values

The main part of the algorithm selects counting array values from the generated list using a simple greedy approach.

Based on the theory that using counting arrays with small counts is beneficial, the counting arrays are sorted by the sum of their counts.

They are then checked in order, and a value is selected if it is compatible with all previously used values. So this involves one single linear pass through the unique counting arrays, where each candidate is compared to the values that were previously selected.

I have some ideas on how the heuristic could potentially be improved. But this seemed like a reasonable starting point, and the results looked quite good.

Code

This is not highly optimized. I had a more elaborate data structure at some point, but it would have needed more work to generalize it beyond n=8, and the difference in performance did not seem very substantial.

#include <cstdint>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <sstream>
#include <iostream>

typedef uint32_t Value;

class Counter {
public:
    static void setN(int n);

    Counter();
    Counter(Value val);

    bool operator==(const Counter& rhs) const;
    bool operator<(const Counter& rhs) const;

    bool collides(const Counter& other) const;

private:
    static const int FIELD_BITS = 4;
    static const uint64_t FIELD_MASK = 0x0f;

    static int m_n;
    static Value m_valMask;

    uint64_t fieldSum() const;

    uint64_t m_fields;
};

void Counter::setN(int n) {
    m_n = n;
    m_valMask = (static_cast<Value>(1) << n) - 1;
}

Counter::Counter()
  : m_fields(0) {
}

Counter::Counter(Value val) {
    m_fields = 0;
    for (int k = 0; k < m_n; ++k) {
        m_fields <<= FIELD_BITS;
        m_fields |= __builtin_popcount(val & m_valMask);
        val >>= 1;
    }
}

bool Counter::operator==(const Counter& rhs) const {
    return m_fields == rhs.m_fields;
}

bool Counter::operator<(const Counter& rhs) const {
    uint64_t lhsSum = fieldSum();
    uint64_t rhsSum = rhs.fieldSum();
    if (lhsSum < rhsSum) {
        return true;
    }
    if (lhsSum > rhsSum) {
        return false;
    }

    return m_fields < rhs.m_fields;
}

bool Counter::collides(const Counter& other) const {
    uint64_t fields1 = m_fields;
    uint64_t fields2 = other.m_fields;

    for (int k = 0; k < m_n; ++k) {
        uint64_t c1 = fields1 & FIELD_MASK;
        uint64_t c2 = fields2 & FIELD_MASK;

        if (c1 > 2 * c2 || c2 > 2 * c1) {
            return false;
        }

        fields1 >>= FIELD_BITS;
        fields2 >>= FIELD_BITS;
    }

    return true;
}

int Counter::m_n = 0;
Value Counter::m_valMask = 0;

uint64_t Counter::fieldSum() const {
    uint64_t fields = m_fields;
    uint64_t sum = 0;
    for (int k = 0; k < m_n; ++k) {
        sum += fields & FIELD_MASK;
        fields >>= FIELD_BITS;
    }

    return sum;
}

typedef std::vector<Counter> Counters;

int main(int argc, char* argv[]) {
    int n = 0;
    std::istringstream strm(argv[1]);
    strm >> n;

    Counter::setN(n);

    int nBit = 2 * n - 1;
    Value maxVal = static_cast<Value>(1) << nBit;

    Counters allCounters;

    for (Value val = 0; val < maxVal; ++val) {
        Counter counter(val);
        allCounters.push_back(counter);
    }

    std::sort(allCounters.begin(), allCounters.end());

    Counters::iterator uniqEnd =
        std::unique(allCounters.begin(), allCounters.end());
    allCounters.resize(std::distance(allCounters.begin(), uniqEnd));

    Counters solCounters;
    int nSol = 0;

    for (Value idx = 0; idx < allCounters.size(); ++idx) {
        const Counter& counter = allCounters[idx];

        bool valid = true;
        for (int iSol = 0; iSol < nSol; ++iSol) {
            if (solCounters[iSol].collides(counter)) {
                valid = false;
                break;
            }
        }

        if (valid) {
            solCounters.push_back(counter);
            ++nSol;
        }
    }

    std::cout << "result: " << nSol << std::endl;

    return 0;
}

Reto Koradi

Posted 2015-08-07T07:59:09.067

Reputation: 4 870

I had recursive solutions based on similar code that are guaranteed to find the maximum. But it took a while for n=4 already. It might have finished for n=5 with some patience. You must have used a better backtracking strategy to even make it to n=7. Was yours heuristic, or did it explore the whole solution space? I'm contemplating some ideas on how to make this better, either by fine tuning the sorting order, or by maybe not being purely greedy. – Reto Koradi – 2015-08-09T23:03:01.077

My understanding is that there is no backtracking in Peter Taylor's answer. It is purely greedy. The main tricks are that he reduces the number of counting arrays to 3^n and the two heuristics he describes. – None – 2015-08-10T04:33:42.053

@Lembik My comment was in response to a comment that was deleted. The number of counting arrays should be the same, since I build that based on actual values, and reduce it to only the unique ones. I updated a backtracking version of the algorithm. Even though it does not terminate within a reasonable time, it finds 76 for n=7 quickly. But trying it for n=9, it was still stuck at 164 when I stopped it after 20 minutes. So extending this with a limited form of simple backtracking does not look generally promising. – Reto Koradi – 2015-08-10T04:56:47.540