Counting orthogonal circulant matrices

8

3

Two rows of a matrix are orthogonal if their inner product equals zero. Call a matrix with all rows pairwise orthogonal an orthogonal matrix. A circulant matrix is one where each row vector is rotated one element to the right relative to the preceding row vector. We will only be interested in matrices where the entries are either -1 or 1.

Task

Write code to count as many different n/2 by n orthogonal, circulant matrices as possible in 2 minutes (for even n).

Input

The code has no input. It can try any even values of n it likes. For example, the code could try all n that are multiplies of 4 starting from the smallest and also try n = 2.

Output

The number of orthogonal circulant matrices you have found. There should also be a simple switch in your code to enable it to output the matrices themselves.

Score

The number of circulant matrices you have found.

Hints

Orthogonal n/2 by n circulant matrices only exist when n is a multiple of 4 or n is less than 4.

An example orthogonal circulant matrix is:

-1  1 -1 -1
-1 -1  1 -1

Tips for a naive approach

The most naive approach is just to iterate over all possible matrices. This can be sped up using the following two observations.

  • To test orthogonality of a circulant matrix we need only compare each row to the first one. This is implemented in the sample code.

  • We can iterate over Lyndon words and then if we find an orthogonal matrix multiply by the number of possible rotations. This is idea as yet untested so may be buggy.

Sample code

This is a very simple and naive python answer. I ran it using timeout 120.

import itertools
def check_orthogonal(row):
    for i in xrange(1,int(n/2)):
        if (sum(row[j]*row[(j+i) % n] for j in xrange(n)) != 0):
            return False
    return True

counter = 0
for n in xrange(4,33,4):
    for row in itertools.product([-1,1],repeat = n):
        if check_orthogonal(row):
            counter +=1
            print "Counter is ", counter, ". n = ", n

Correctness tests

For n = 4,8,12,16,20,24,28, the number of distinct matrices you should get is 12,40,144,128,80,192,560, respectively.

Levels of awesomeness

Judging by the sample code, I hereby present two levels of awesomeness that any answer can aspire to achieve.

  • Silver level awesomeness is achieved by getting a score or 1156.

  • Gold level of awesomeness is to get higher than that.

Languages and libraries

You can use any language or library you like (that wasn't designed for this challenge). However, for the purposes of scoring I will run your code on my machine so please provide clear instructions for how to run it on Ubuntu.

My Machine The timings will be run on my machine. This is a standard Ubuntu install on an 8GB AMD FX-8350 Eight-Core Processor. This also means I need to be able to run your code.

Leading answers

  • 332 by flawr in Octave

  • 404 by R.T. in Python

  • 744 by Sample Solution using pypy

  • 1156 by Thomas Kwa using Java. Silver level awesomeness!

  • 1588 by Reimer Behrends in OCaml. Gold level awesomeness!

user9206

Posted 2016-01-11T17:31:48.137

Reputation:

Can you actually find one for every n that is a multiple of four? – flawr – 2016-01-11T21:43:26.430

@flawr Now that's a great question! I have no idea but I would love to know. – None – 2016-01-11T21:50:52.680

From what I've seen now (if my script is correct) they seem to be pretty rare. As far as I know the circulant hadamard matrices (square matrices with entries in {-1,1}) are conjectured to exist only for n=1 and 4. – flawr – 2016-01-11T21:52:40.420

@flawr Yes on both counts (I wanted something that was rare to make the challenge interesting). But I am unaware of anything at all being known about n/2 by n circulant matrices. – None – 2016-01-11T21:56:05.997

Why are libraries for primality testing and factoring prohibited? – feersum – 2016-01-12T02:33:11.973

@feersum That was a copy and paste error! Fixed now. – None – 2016-01-12T07:23:32.297

1I have a solution using bitmasking that I think will work for n=32. – lirtosiast – 2016-01-15T18:18:31.360

@ThomasKwa That sounds great. If it finishes n=28 in time that would already be amazing. However...Unfortunately there are no solutions for n=32 :) I feel the Lyndon words idea should work and that would be fast enough to get somewhere with n=36 too. – None – 2016-01-15T19:38:06.393

@Lembik Really? That's too bad. n=28 finishes in about 10 seconds, and should be further optimizable, so I should be able to get through a small portion of n=36. I don't know if Lyndon words can be generared in less than ~1000 cycles each, which would be about the breakeven point. – lirtosiast – 2016-01-15T23:39:51.673

Answers

5

OCaml, 1588 (n = 36)

This solution uses the usual bit pattern approach to represent vectors of -1s and 1s. The scalar product is as usual calculated by taking the xor of two bit vectors and subtracting n/2. Vectors are orthogonal iff their xor has exactly n/2 bits set.

Lyndon words are not per se useful as a normalized representation for this, as they exclude any pattern that is a rotation of itself. They are also relatively expensive to compute. Therefore, this code uses a somewhat simpler normal form, which requires that the longest consecutive sequence of zeroes after rotation (or one of them, if there are multiples) must occupy the most significant bits. It follows that the least significant bit is always 1.

Observe also that any candidate vector must have at least n/4 ones (and at most 3n/4). We therefore only consider vectors with n/4 ... n/2 bits set, since we can derive others via complement and rotation (in practice, all such vectors seem to have between n/2-2 and n/2+2 ones, but that seems to also be difficult to prove).

We build these normal forms from the least significant bit up, observing the constraint that any remaining runs of zeroes (called "gaps" in the code) must follow our normal form requirement. In particular, as long as at least one more 1-bit has to be placed, there must be room for the current gap and another than that is at least as big as the current gap or any other gap observed so far.

We also observe that the list of results is small. Therefore, we do not try to avoid duplicates during the discovery process, but simply record the results in per-worker sets and calculate the union of these sets at the end.

It is worth noting that the runtime cost of the algorithm still grows exponentially and at a rate that is comparable to that of the brute-force version; what this buys us is essentially a reduction by a constant factor, and comes at the cost of an algorithm that is more difficult to parallelize than the brute-force version.

Output for n up to 40:

 4: 12
 8: 40
12: 144
16: 128
20: 80
24: 192
28: 560
32: 0
36: 432
40: 640

The program is written in OCaml, to be compiled with:

ocamlopt -inline 100 -nodynlink -o orthcirc unix.cmxa bigarray.cmxa orthcirc.ml

Run ./orthcirc -help to see what options the program supports.

On architectures that support it, -fno-PIC may offer some small additional performance gain.

This is written for OCaml 4.02.3, but may also work with older versions (as long as they aren't too old).


UPDATE: This new version offers better parallelization. Note that it uses p * (n/4 + 1) worker threads per instance of the problem, and some of them will still run considerably shorter than others. The value of p must be a power of 2. The speedup on 4-8 cores is minimal (perhaps around 10%), but it scales better to a large number of cores for large n.

let max_n = ref 40
let min_n = ref 4
let seq_mode = ref false
let show_res = ref false
let fanout = ref 8

let bitcount16 n =
  let b2 n = match n land 3 with 0 -> 0 | 1 | 2 -> 1 | _ -> 2 in
  let b4 n = (b2 n) + (b2 (n lsr 2)) in
  let b8 n = (b4 n) + (b4 (n lsr 4)) in
  (b8 n) + (b8 (n lsr 8))

let bitcount_data =
  let open Bigarray in
  let tmp = Array1.create int8_signed c_layout 65536 in
    for i = 0 to 65535 do
      Array1.set tmp i (bitcount16 i)
    done;
    tmp

let bitcount n =
  let open Bigarray in
  let bc n = Array1.unsafe_get bitcount_data (n land 65535) in
  (bc n) + (bc (n lsr 16)) + (bc (n lsr 32)) + (bc (n lsr 48))

module IntSet = Set.Make (struct
  type t = int
  let compare = Pervasives.compare
end)

let worker_results = ref IntSet.empty

let test_row vec row mask n =
  bitcount ((vec lxor (vec lsr row) lxor (vec lsl (n-row))) land mask) * 2 = n

let record vec len n =
  let m = (1 lsl n) - 1 in
  let rec test_orth_circ ?(row=2) vec m n =
    if 2 * row >= n then true
    else if not (test_row vec row m n) then false
    else test_orth_circ ~row:(row+1) vec m n
  in if test_row vec 1 m n &&
        test_orth_circ vec m n then
  begin
    for i = 0 to n - 1 do
      let v = ((vec lsr i) lor (vec lsl (n - i))) land m in
      worker_results := IntSet.add v !worker_results;
      worker_results := IntSet.add (v lxor m) !worker_results
    done
  end

let show vec n =
  for i = 0 to n / 2 - 1 do
    let vec' = (vec lsr i) lor (vec lsl (n - i)) in
    for j = 0 to n-1 do
      match (vec' lsr (n-j)) land 1 with
      | 0 -> Printf.printf "  1"
      | _ -> Printf.printf " -1"
    done; Printf.printf "\n"
  done; Printf.printf "\n"; flush stdout

let rec build_normalized ~prefix ~plen ~gap ~maxgap ~maxlen ~bits ~fn =
  if bits = 0 then
    fn prefix plen maxlen
  else begin
    let room = maxlen - gap - plen - bits in
    if room >= gap && room >= maxgap then begin
      build_normalized
        ~prefix:(prefix lor (1 lsl (plen + gap)))
        ~plen:(plen + gap + 1)
        ~gap:0
        ~maxgap:(if gap > maxgap then gap else maxgap)
        ~maxlen
        ~bits:(bits - 1)
        ~fn;
      if room > gap + 1 && room > maxgap then
        build_normalized ~prefix ~plen ~gap:(gap + 1) ~maxgap ~maxlen ~bits ~fn
    end
  end

let rec log2 = function
| 0 -> -1
| n -> 1 + (log2 (n lsr 1))

let rec test_gap n pat =
  if n land pat = 0 then true
  else if pat land 1 = 0 then test_gap n (pat lsr 1)
  else false

let rec test_gaps n maxlen len =
  let fill k = (1 lsl k) -1 in
  if len = 0 then []
  else if test_gap n ((fill maxlen) lxor (fill (maxlen-len))) then
    len :: (test_gaps n maxlen (len-1))
  else test_gaps n maxlen (len-1)

let rec longest_gap n len =
  List.fold_left max 0 (test_gaps n len len)

let start_search low lowbits maxlen bits fn =
  let bits = bits - (bitcount low) in
  let plen = log2 low + 1 in
  let gap = lowbits - plen in
  let maxgap = longest_gap low lowbits in
  worker_results := IntSet.empty;
  if bits >= 0 then
    build_normalized ~prefix:low ~plen ~gap ~maxgap ~maxlen ~bits ~fn;
  !worker_results

let spawn f x =
  let open Unix in
  let safe_fork () = try fork() with _ -> -1 in
  let input, output = pipe () in
  let pid = if !seq_mode then -1 else safe_fork() in
  match pid with
  | -1 -> (* seq_mode selected or fork() failed *)
     close input; close output; (fun () -> f x)
  | 0 -> (* child process *)
    close input;
    let to_parent = out_channel_of_descr output in
    Marshal.to_channel to_parent (f x) [];
    close_out to_parent; exit 0
  | pid -> (* parent process *)
    close output;
    let from_child = in_channel_of_descr input in
    (fun () -> 
      ignore (waitpid [] pid);
      let result = Marshal.from_channel from_child in
      close_in from_child; result)

let worker1 (n, k) =
  start_search 1 1 n k record

let worker2 (n, k, p) =
  start_search (p * 2 + 1) (log2 !fanout + 1) n k record

let spawn_workers n =
  let queue = Queue.create () in
  if n = 4 || n = 8 then begin
    for i = n / 4 to n / 2 do
      Queue.add (spawn worker1 (n, i)) queue
    done
  end else begin
    for i = n / 2 downto n / 4 do
      for p = 0 to !fanout - 1 do
        Queue.add (spawn worker2 (n, i, p)) queue
      done
    done
  end;
  Queue.fold (fun acc w -> IntSet.union acc (w())) IntSet.empty queue

let main () =
  if !max_n > 60 then begin
    print_endline "error: cannot handle n > 60";
    exit 1
  end;
  min_n := max !min_n 4;
  if bitcount !fanout <> 1 then begin
    print_endline "error: number of threads must be a power of 2";
    exit 1;
  end;
  for n = !min_n to !max_n do
    if n mod 4 = 0 then
    let result = spawn_workers n in
    Printf.printf "%2d: %d\n" n (IntSet.cardinal result);
    if !show_res then
      IntSet.iter (fun v -> show v n) result;
    flush stdout
  done

let () =
  let args =[("-m", Arg.Set_int min_n, "min size of the n by n/2 matrix");
             ("-n", Arg.Set_int max_n, "max size of the n by n/2 matrix");
             ("-p", Arg.Set_int fanout, "parallel fanout");
             ("-seq", Arg.Set seq_mode, "run in single-threaded mode");
             ("-show", Arg.Set show_res, "display list of results") ] in
  let usage = ("Usage: " ^
               (Filename.basename Sys.argv.(0)) ^
               " [-n size] [-seq] [-show]") in
  let error _ = Arg.usage args usage; exit 1 in
  Arg.parse args error usage;
  main ()

Reimer Behrends

Posted 2016-01-11T17:31:48.137

Reputation: 899

This is great and welcome back! Having said that... how about a Nim answer too? :) – None – 2016-01-18T08:49:58.093

Your code only gets to 36: 432 on my machine in time. – None – 2016-01-18T22:28:13.667

Huh. It runs to 40 in just under two minutes on my laptop with a quadcore processor (2.5GHz Intel Core i7), so I was fairly confident it would make it to 40 on yours, too. I'll update accordingly. About your other question, while I have a brute-force Nim implementation, that one still runs twice as slow (because of the brute-force part) and isn't too different from Thomas Kwa's code (just a bit more reduction of the search space), so it's not like you're missing anything exciting. – Reimer Behrends – 2016-01-18T22:56:08.190

Am I right that your code needs a 64 bit system? I just tested it on a 32 bit one where it always outputs 0. – None – 2016-01-19T14:19:55.250

1Correct, as it stores the vectors as ints. I could also force a 64-bit representation (or even big integers), but that would be enormously slower on a 32-bit system. – Reimer Behrends – 2016-01-19T16:53:41.573

Just a note that it turns out there are in fact an infinite number of matrices. However, it's not at all clear how to construct them! http://math.stackexchange.com/questions/1617517/explicit-construction-of-n-2-by-n-circulant-partial-hadamard-matrices

– None – 2016-01-19T22:07:46.290

The first version of your code took just over 4 minutes and this new version just over 3. That's an impressive improvement! – None – 2016-01-20T20:49:34.767

It's just better parallelization, the algorithm is the same. It may be giving you a better speedup because you actually have eight separate cores while my computer has four hyperthreaded ones. The old version did tail off into running just a small number of processes at the same time, while the new version distributes the load more evenly across workers (I mostly wrote it so that I could throw it at 64 cores without wasting most of them). That said, I'm still baffled why it runs quite a bit faster on my seemingly weaker laptop. – Reimer Behrends – 2016-01-20T21:05:30.493

3

Java, 1156 matrices

This uses fairly naive bitmasking, and takes under 15 seconds for n=28 on my machine.

Circulant matrices are determined by their first rows. Therefore, I represent the first rows of the matrices as bit vectors: 1 and 0 represent -1 and 1. Two rows are orthogonal when the number of set bits when they're xor'd together is n/2.

import java.util.Arrays;

class Main {

    static void dispmatrix(long y,int N)
    {
        int[][] arr = new int[N/2][N];
        for(int row=0; row < N/2; row++)
        {
            for(int col=0; col < N; col++)
            {
                arr[row][col] = (int) ((y >>> (N+row-col)) & 1L);
            }
        }
        System.out.println(Arrays.deepToString(arr));
    }

  public static void main(String[] args) {

    int count = 0;
    boolean good;
    boolean display = false;
    long y;
    for(int N=4; N <= 28 ;N += 4)
    {
    long mask = (1L << N) - 1;
    for(long x=0; x < (1L<<N); x++)
    {
        good = true;
        y = x + (x << N);

        for(int row = 1; row < N/2; row++)
        {
            good &= N/2 == Long.bitCount((y ^ (y >>> row)) & mask);
        }

        if(good)
        {
            if(display) dispmatrix(y,N);
            count++;
        }

    }
    System.out.println(count);
    }
  }
}

I can't get Eclipse to work right now, so this was tested on repl.it.

Here are the number of first rows that are orthogonal to the first r rows afterwards for n=28:

[268435456, 80233200, 23557248, 7060320, 2083424, 640304, 177408, 53088, 14896, 4144, 2128, 1008, 1008, 560]

Optimizations:

  • Since orthogonality doesn't change with a cyclic rotation of both rows, we just need to check that the first row is orthogonal to the rest.
  • Rather than manually do an N-bit cyclic shift N/2 times, I store the bits to be shifted in at the top of the long, and then use a single and with the lower N bits to extract the needed ones.

Possible further optimizations:

  • Generate Lyndon words. According to my calculations this only makes sense if Lyndon words can be generated in less than ~1000 cycles each.
  • If Lyndon words take too long, we can still only check the bit vectors that start in 00, and use their rotations (and rotations of the NOT) when we find an orthogonal matrix. We can do this because 0101...01 and 1010...10 are not possible first rows, and all the others contain either a 00 or a 11.
  • Branch maybe halfway through when the matrix is probably orthogonal. I don't know how much branching will cost, though, so I'll need to test.
  • If the above works, start with a row other than row 1?
  • Maybe there's some mathematical way to eliminate some possibilities. I don't know what that would be.
  • Write in C++, of course.

lirtosiast

Posted 2016-01-11T17:31:48.137

Reputation: 20 331

This is great. Thank you! I look forward to some of your optimizations so we can see numbers for n=36 too . – None – 2016-01-16T07:50:20.817

Oh and you reached Silver level awesomeness! :) – None – 2016-01-16T07:52:45.533

2

Python (404 matrices on i5-5300U)

Mainly posting this as a starting point for others to improve on, this can be cleaned up a lot, parallelized, etc.

import numpy
import itertools
import time

def findCirculantOrthogonalRows(n, t1, timeLimit):
  validRows = []
  testMatrix = numpy.zeros((n//2, n))
  identityMatrixScaled = n*numpy.identity(n//2)
  outOfTime = False
  for startingRowTuple in itertools.product([1, -1], repeat=n):
     for offset in range(n//2):
       for index in range(n):
         testMatrix[offset][index] = startingRowTuple[(index-offset) % n]

     if(numpy.array_equal(identityMatrixScaled, testMatrix.dot(testMatrix.transpose()))):
      validRows.append(startingRowTuple)

    if(time.clock() - t1 >= timeLimit):
      outOfTime = True
      break

  return (validRows, outOfTime)

n = 4
validFirstRows = []
t1 = time.clock()
timeLimit = 120
fullOutput = True

while(True):
  print('calling with', n)
  (moreRows, outOfTime) = findCirculantOrthogonalRows(n, t1, timeLimit)

  if(len(moreRows) > 0):
    validFirstRows.extend(moreRows)

  if(outOfTime == True):
    break

  n += 4

print('Found', len(validFirstRows), 'circulant orthogonal matrices in', timeLimit, 'seconds')

if(fullOutput):
  counter = 1
  for r in validFirstRows:
    n = len(r)
    matrix = numpy.zeros((n//2, n))
    for offset in range(n//2):
      for index in range(n):
        matrix[offset][index] = r[(index-offset) % n]
    print('matrix #', counter, ':\n', matrix)
    counter += 1
    input()

R.T.

Posted 2016-01-11T17:31:48.137

Reputation: 501

I added some sample code. The first obvious improvement is to iterate over Lyndon words but I am not sure how to code that. – None – 2016-01-12T12:33:22.353

2

Matlab/Octave, 381 / 328 matrices

Also just the naive approach, trying every possible combination.

counter = 0;
%ok: 2,4,8
%none: 
tic
for n=[2,4,8,12,16,20];
    m=n/2;
    N=2^n-1;
    for k=1:N

        k/N; %remove ; in order to see progress
        v=(dec2bin(k,n)-'0')*2-1;               %first row

        z=toeplitz([v(1) fliplr(v(m+2:n))], v); %create circulante matrix
        w = z * z.';
        if norm(w - diag(diag(w))) < eps
            counter = counter+1;
            %n    %remove % in order to see the matrices
            %z
        end
        if toc>120;
            break;
        end
    end
end
counter

flawr

Posted 2016-01-11T17:31:48.137

Reputation: 40 560

In octave the code prints a huge amount to the screen which might be slowing it down. "ans = ...." – None – 2016-01-12T07:28:53.913

Oh right, I forgot to add that: The most indented lines there is an n and a z, these two can be commented out with a single %. And then you can add a ; after the counter = counter+1 and the k/N which will suppress the output. – flawr – 2016-01-12T09:44:42.970