Calculate the Hafnian as quickly as possible

12

2

The challenge is to write the fastest code possible for computing the Hafnian of a matrix.

The Hafnian of a symmetric 2n-by-2n matrix A is defined as:

Here S2n represents the set of all permutations of the integers from 1 to 2n, that is [1, 2n].

The wikipedia link also gives a different looking formula which may be of interest (and even faster methods exist if you look further on the web). The same wiki page talks about adjacency matrices but your code should work for other matrices as well. You can assume the values will all be integers but not that that they are all positive.

There is also a faster algorithm but it seems hard to understand. and Christian Sievers was the first to implement it (in Haskell).

In this question matrices are all square and symmetric with even dimension.

Reference implementation (note this is using the slowest possible method).

Here is some example python code from Mr. Xcoder.

from itertools import permutations
from math import factorial

def hafnian(matrix):
    my_sum = 0
    n = len(matrix) // 2
    for sigma in permutations(range(n*2)):
        prod = 1
        for j in range(n):
            prod *= matrix[sigma[2*j]][sigma[2*j+1]]
        my_sum += prod
    return my_sum / (factorial(n) * 2 ** n)

print(hafnian([[-1, 1, 1, -1, 0, 0, 1, -1], [1, 0, 1, 0, -1, 0, -1, -1], [1, 1, -1, 1, -1, -1, 0, -1], [-1, 0, 1, -1, -1, 1, -1, 0], [0, -1, -1, -1, -1, 0, 0, -1], [0, 0, -1, 1, 0, 0, 1, 1], [1, -1, 0, -1, 0, 1, 1, 0], [-1, -1, -1, 0, -1, 1, 0, 1]]))
4

M = [[1, 1, 0, 0, 0, 0, 0, 1, 0, 0], [1, 1, -1, 0, -1, 1, 1, 1, 0, -1], [0, -1, -1, -1, 0, -1, -1, 0, -1, 1], [0, 0, -1, 1, -1, 1, -1, 0, 1, -1], [0, -1, 0, -1, -1, -1, -1, 1, -1, 1], [0, 1, -1, 1, -1, 1, -1, -1, 1, -1], [0, 1, -1, -1, -1, -1, 1, 0, 0, 0], [1, 1, 0, 0, 1, -1, 0, 1, 1, -1], [0, 0, -1, 1, -1, 1, 0, 1, 1, 1], [0, -1, 1, -1, 1, -1, 0, -1, 1, 1]]

print(hafnian(M))
-13

M = [[-1, 0, -1, -1, 0, -1, 0, 1, -1, 0, 0, 0], [0, 0, 0, 0, 0, -1, 0, 1, -1, -1, -1, -1], [-1, 0, 0, 1, 0, 0, 0, 1, -1, 1, -1, 0], [-1, 0, 1, -1, 1, -1, -1, -1, 0, -1, -1, -1], [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, -1, 0], [-1, -1, 0, -1, 0, 0, 1, 1, 1, 1, 1, 0], [0, 0, 0, -1, 0, 1, 1, -1, -1, 0, 1, 0], [1, 1, 1, -1, 0, 1, -1, 1, -1, -1, -1, -1], [-1, -1, -1, 0, 0, 1, -1, -1, -1, 1, -1, 0], [0, -1, 1, -1, 1, 1, 0, -1, 1, -1, 1, 1], [0, -1, -1, -1, -1, 1, 1, -1, -1, 1, 0, -1], [0, -1, 0, -1, 0, 0, 0, -1, 0, 1, -1, 1]]

print(hafnian(M))
13

M = [[-1, 1, 0, 1, 0, -1, 0, 0, -1, 1, -1, 1, 0, -1], [1, -1, 1, -1, 1, 1, -1, 0, -1, 1, 1, 0, 0, -1], [0, 1, 1, 1, -1, 1, -1, -1, 0, 0, -1, 0, -1, -1], [1, -1, 1, -1, 1, 0, 1, 1, -1, -1, 0, 0, 1, 1], [0, 1, -1, 1, 0, 1, 0, 1, -1, -1, 1, 1, 0, -1], [-1, 1, 1, 0, 1, 1, -1, 0, 1, -1, -1, -1, 1, -1], [0, -1, -1, 1, 0, -1, -1, -1, 0, 1, -1, 0, 1, -1], [0, 0, -1, 1, 1, 0, -1, 0, 0, -1, 0, 0, 0, 1], [-1, -1, 0, -1, -1, 1, 0, 0, 1, 1, 0, 1, -1, 0], [1, 1, 0, -1, -1, -1, 1, -1, 1, 1, 1, 0, 1, 0], [-1, 1, -1, 0, 1, -1, -1, 0, 0, 1, -1, 0, -1, 0], [1, 0, 0, 0, 1, -1, 0, 0, 1, 0, 0, 1, 1, 1], [0, 0, -1, 1, 0, 1, 1, 0, -1, 1, -1, 1, 1, -1], [-1, -1, -1, 1, -1, -1, -1, 1, 0, 0, 0, 1, -1, -1]]

print(hafnian(M))
83

The task

You should write code that, given an 2n by 2n matrix, outputs its Hafnian.

As I will need to test your code it would be helpful if you could give a simple way for me to give a matrix as input to your code, for example by reading from standard in. I will test your code in randomly chosen matrices with elements selected from {-1, 0, 1}. The purpose of testing like this is to reduce the chance the Hafnian will be a very large value.

Ideally your code will be able to read in matrices exactly as I have them in the examples in this question straight from standard in. That is the input would look like [[1,-1],[-1,-1]] for example. If you want to use another input format, please ask and I will do my best to accommodate.

Scores and ties

I will test your code on random matrices of increasing size and stop the first time your code takes more than 1 minute on my computer. The scoring matrices will be consistent for all submissions in order to ensure fairness.

If two people get the same score then the winner is the one which is fastest for that value of n. If those are within 1 second of each other then it is the one posted first.

Languages and libraries

You can use any available language and libraries you like but no pre-existing function to compute the Hafnian. Where feasible, it would be good to be able to run your code so please include a full explanation for how to run/compile your code in Linux if at all possible.`

My Machine The timings will be run on my 64-bit machine. This is a standard ubuntu install with 8GB RAM, AMD FX-8350 Eight-Core Processor and Radeon HD 4250. This also means I need to be able to run your code.

Call for answers in more languages

It would be great to get answers in your favorite super fast programming language. To start things off, how about fortran, nim and rust?

Leaderboard

  • 52 by miles using C++. 30 seconds.
  • 50 by ngn using C. 50 seconds.
  • 46 by Christian Sievers using Haskell. 40 seconds.
  • 40 by miles using Python 2 + pypy. 41 seconds.
  • 34 by ngn using Python 3 + pypy. 29 seconds.
  • 28 by Dennis using Python 3. 35 seconds. (Pypy is slower)

user9206

Posted 2018-03-02T15:16:01.223

Reputation:

Is there a limit for the absolute values of the matrix entries? Can we return a floating point approximation? Do we have to use arbitrary precision integers? – Dennis – 2018-03-02T19:36:22.790

@Dennis In practice I will only use -1,0,1 to test (chosen at random). I don't want it to be a big int challenge. In all honesty I don't know if we will hit the limits of 64 bit ints before the code gets too slow to run but my guess is that we won't. Currently we are nowhere near that. – None – 2018-03-02T19:37:57.483

If the entries are limited to -1,0,1, that should be mentioned on the question. Does our code have to work at all for other matrices? – Dennis – 2018-03-02T19:40:34.737

@Dennis An old version used to say that but I must have written over it. I would prefer it if the code weren't specialised for -1,0,1 entries but I suppose I can't stop that. – None – 2018-03-02T19:44:11.313

Do you have more test cases? perhaps for larger n? – miles – 2018-03-03T10:26:11.477

@miles Added a slightly bigger example. – None – 2018-03-03T10:36:00.167

I think that to be fair the randomly generated matrices should be the same for all submissions. – user202729 – 2018-03-03T11:29:30.983

@user202729 Yes they will be – None – 2018-03-03T13:07:41.473

Answers

14

Haskell

import Control.Parallel.Strategies
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector as VB

type Poly = V.Vector Int

type Matrix = VB.Vector ( VB.Vector Poly )

constpoly :: Int -> Int -> Poly
constpoly n c = V.generate (n+1) (\i -> if i==0 then c else 0)

add :: Poly -> Poly -> Poly
add = V.zipWith (+)

shiftmult :: Poly -> Poly -> Poly
shiftmult a b = V.generate (V.length a) 
                           (\i -> sum [ a!j * b!(i-1-j) | j<-[0..i-1] ])
  where (!) = V.unsafeIndex

x :: Matrix -> Int -> Int -> Int -> Poly -> Int
x  _    0  _ m p = m * V.last p
x mat n c m p =
  let mat' = VB.generate (2*n-2) $ \i ->
             VB.generate i       $ \j ->
                 shiftmult (mat!(2*n-1)!i) (mat!(2*n-2)!j) `add`
                 shiftmult (mat!(2*n-1)!j) (mat!(2*n-2)!i) `add`
                 (mat!i!j)
      p' = p `add` shiftmult (mat!(2*n-1)!(2*n-2)) p
      (!) = VB.unsafeIndex
      r = if c>0 then parTuple2 rseq rseq else r0
      (a,b) = (x mat (n-1) (c-1) m p, x mat' (n-1) (c-1) (-m) p')
              `using` r
  in a+b

haf :: [[Int]] -> Int
haf m = let n=length m `div` 2
        in x (VB.fromList $ map (VB.fromList . map (constpoly n)) m) 
             n  5  ((-1)^n)  (constpoly n 1) 

main = getContents >>= print . haf . read

This implements a variation of Algorithm 2 of Andreas Björklund: Counting Perfect Matchings as Fast as Ryser.

Compile using ghc with compile time options -O3 -threaded and use run time options +RTS -N for parallelization. Takes input from stdin.

Christian Sievers

Posted 2018-03-02T15:16:01.223

Reputation: 6 366

2Maybe note that parallel and vector must be installed? – H.PWiz – 2018-03-03T21:39:18.247

@H.PWiz Nobody complained here, but sure, noting it won't hurt. Well, now you did.

– Christian Sievers – 2018-03-03T22:18:47.753

@ChristianSievers I don't think they're complaining. The OP might not be familiar with Haskell, so explicitly stating what must be installed to be able to time the code is a good idea. – Dennis – 2018-03-03T23:09:17.303

@Dennis I didn't mean "you did complain" but "you did note it". And I didn't think of complaining as a negative thing. The OP is the same as in the challenge I linked to, so there should be no problem. – Christian Sievers – 2018-03-03T23:34:01.903

N = 40 in 7.5 seconds on TIO... Man, this is fast! – Dennis – 2018-03-04T03:37:05.710

This answer is particularly impressive. – None – 2018-03-04T10:00:39.440

It seems that you can get a minor speed-up just by extracting mat!(2*n-1) and mat!(2*n-2) into their own variables. I doubt it increase the matrix size though – H.PWiz – 2018-03-04T16:55:57.617

@H.PWiz Did you try? I don't expect any noticable improvement from such changes. (I tried several others.) – Christian Sievers – 2018-03-04T17:04:58.800

I did try, it was saving around 2 seconds consistently. Although now, there is no difference. Perhaps I was using different matrices by mistake – H.PWiz – 2018-03-04T17:14:29.380

Can this be made faster by compiling with LLVM? – Potato44 – 2018-03-11T20:19:35.600

@Potato44 I got no gain from using -fllvm, but depending on versions and maybe architecture you may have more luck. – Christian Sievers – 2018-03-11T23:06:25.233

6

Python 3

from functools import lru_cache

@lru_cache(maxsize = None)
def haf(matrix):
	n = len(matrix)
	if n == 2: return matrix[0][1]
	h = 0
	for j in range(1, n):
		if matrix[0][j] == 0: continue
		copy = list(matrix)
		del copy[:j+1:j]
		copy = list(zip(*copy))
		del copy[:j+1:j]
		h += matrix[0][j] * haf(tuple(copy))
	return h

print(haf(tuple(map(tuple, eval(open(0).read())))))

Try it online!

Dennis

Posted 2018-03-02T15:16:01.223

Reputation: 196 637

6

C++ (gcc)

#define T(x) ((x)*((x)-1)/2)
#define S 1
#define J (1<<S)
#define TYPE int

#include <iostream>
#include <vector>
#include <string>
#include <pthread.h>

using namespace std;

struct H {
    int s, w, t;
    TYPE *b, *g;
};

void *solve(void *a);
void hafnian(TYPE *b, int s, TYPE *g, int w, int t);

int n, m, ti = 0;
TYPE r[J] = {0};
pthread_t pool[J];

int main(void) {
    vector<int> a;
    string s;
    getline(cin, s);

    for (int i = 0; i < s.size(); i++)
        if (s[i] == '0' || s[i] == '1')
            a.push_back((s[i-1] == '-' ? -1 : 1)*(s[i] - '0'));

    for (n = 1; 4*n*n < a.size(); n++);
    m = n+1;

    TYPE z[T(2*n)*m] = {0}, g[m] = {0};

    for (int j = 1; j < 2*n; j++)
        for (int k = 0; k < j; k++)
            z[(T(j)+k)*m] = a[j*2*n+k];
    g[0] = 1;

    hafnian(z, 2*n, g, 1, -1);

    TYPE h = 0;
    for (int t = 0; t < ti; t++) {
        pthread_join(pool[t], NULL);
        h += r[t];
    }

    cout << h << endl;

    return 0;
}

void *solve(void *a) {
    H *p = reinterpret_cast<H*>(a);
    hafnian(p->b, p->s, p->g, p->w, p->t);
    delete[] p->b;
    delete[] p->g;
    delete p;
    return NULL;
}

void hafnian(TYPE *b, int s, TYPE *g, int w, int t) {
    if (t == -1 && (n < S || s/2 == n-S)) {
        H *p = new H;
        TYPE *c = new TYPE[T(s)*m], *e = new TYPE[m];
        copy(b, b+T(s)*m, c);
        copy(g, g+m, e);
        p->b = c;
        p->s = s;
        p->g = e;
        p->w = w;
        p->t = ti;
        pthread_create(pool+ti, NULL, solve, p);
        ti++;
    }
    else if (s > 0) {
        TYPE c[T(s-2)*m], e[m];
        copy(b, b+T(s-2)*m, c);
        hafnian(c, s-2, g, -w, t);
        copy(g, g+m, e);

        for (int u = 0; u < n; u++) {
            TYPE *d = e+u+1,
                  p = g[u], *x = b+(T(s)-1)*m;
            for (int v = 0; v < n-u; v++)
                d[v] += p*x[v];
        }

        for (int j = 1; j < s-2; j++)
            for (int k = 0; k < j; k++)
                for (int u = 0; u < n; u++) {
                    TYPE *d = c+(T(j)+k)*m+u+1,
                          p = b[(T(s-2)+j)*m+u], *x = b+(T(s-1)+k)*m,
                          q = b[(T(s-2)+k)*m+u], *y = b+(T(s-1)+j)*m;
                    for (int v = 0; v < n-u; v++)
                        d[v] += p*x[v] + q*y[v];
                }

        hafnian(c, s-2, e, w, t);
    }
    else
        r[t] += w*g[n];
}

Try it online! (13s for n = 24)

Based on the faster Python implementation in my other post. Edit the second line to #define S 3 on your 8-core machine and compile with g++ -pthread -march=native -O2 -ftree-vectorize.

The splits the work in half, so the value of S should be log2(#threads). The types can easily be changed between int, long, float, and double by modifying the value of #define TYPE.

miles

Posted 2018-03-02T15:16:01.223

Reputation: 15 654

This is the leading answer so far. Your code doesn't actually read in the input as specified as it can't cope with spaces. I had to do e.g. tr -d \ < matrix52.txt > matrix52s.txt – None – 2018-03-13T10:52:00.987

@Lembik Sorry, only used it against the spaceless matrix of size 24. Fixed now to work with spaces though. – miles – 2018-03-13T11:13:46.813

4

Python 3

This computes haf(A) as a memoised sum(A[i][j] * haf(A without rows and cols i and j)).

#!/usr/bin/env python3
import json,sys
a=json.loads(sys.stdin.read())
n=len(a)//2
b={0:1}
def haf(x):
 if x not in b:
  i=0
  while not x&(1<<i):i+=1
  x1=x&~(1<<i)
  b[x]=sum(a[i][j]*haf(x1&~(1<<j))for j in range(2*n)if x1&(1<<j)and a[i][j])
 return b[x]
print(haf((1<<2*n)-1))

ngn

Posted 2018-03-02T15:16:01.223

Reputation: 11 449

3

C

Another impl of Andreas Björklund's paper, which is much easier to understand if you also look at Christian Sievers's Haskell code. For the first few levels of the recursion, it distributes round-robin threads over available CPUs. The last level of the recursion, which accounts for half of the invocations, is optimised by hand.

Compile with: gcc -O3 -pthread -march=native; thanks @Dennis for a 2x speed-up

n=24 in 24s on TIO

#define _GNU_SOURCE
#include<sched.h>
#include<stdio.h>
#include<stdlib.h>
#include<memory.h>
#include<unistd.h>
#include<pthread.h>
#define W while
#define R return
#define S static
#define U (1<<31)
#define T(i)((i)*((i)-1)/2)
typedef int I;typedef long L;typedef char C;typedef void V;
I n,ncpu,icpu;
S V f(I*x,I*y,I*z){I i=n,*z1=z+n;W(i){I s=0,*x2=x,*y2=y+--i;W(y2>=y)s+=*x2++**y2--;*z1--+=s;}}
typedef struct{I m;V*a;V*p;pthread_barrier_t*bar;I r;}A;S V*(h1)(V*);
I h(I m,I a[][n+1],I*p){
 m-=2;I i,j,k=0,u=T(m),v=u+m,b[u][n+1],q[n+1];
 if(!m){I*x=a[v+m],*y=p+n-1,s=0;W(y>=p)s-=*x++**y--;R s;}
 memcpy(b,a,sizeof(b));memcpy(q,p,sizeof(q));f(a[v+m],p,q);
 for(i=1;i<m;i++)for(j=0;j<i;j++){f(a[u+i],a[v+j],b[k]);f(a[u+j],a[v+i],b[k]);k++;}
 if(2*n-m>8)R h(m,a,p)-h(m,b,q);
 pthread_barrier_t bar;pthread_barrier_init(&bar,0,2);pthread_t th;
 cpu_set_t cpus;CPU_ZERO(&cpus);CPU_SET(icpu++%ncpu,&cpus);
 pthread_attr_t attr;pthread_attr_init(&attr);
 pthread_attr_setaffinity_np(&attr,sizeof(cpu_set_t),&cpus);
 A arg={m,a,p,&bar};pthread_create(&th,&attr,h1,&arg);
 I r=h(m,b,q);pthread_barrier_wait(&bar);pthread_join(th,0);pthread_barrier_destroy(&bar);
 R arg.r-r;
}
S V*h1(V*x0){A*x=(A*)x0;x->r=h(x->m,x->a,x->p);pthread_barrier_wait(x->bar);R 0;}
I main(){
 ncpu=sysconf(_SC_NPROCESSORS_ONLN);
 S C s[200000];I i=0,j=0,k,l=0;W((k=read(0,s+l,sizeof(s)-l))>0)l+=k;
 n=1;W(s[i]!=']')n+=s[i++]==',';n/=2;
 I a[T(2*n)][n+1];memset(a,0,sizeof(a));k=0;
 for(i=0;i<2*n;i++)for(j=0;j<2*n;j++){
  W(s[k]!='-'&&(s[k]<'0'||s[k]>'9'))k++;
  I v=0,m=s[k]=='-';k+=m;W(k<l&&('0'<=s[k]&&s[k]<='9'))v=10*v+s[k++]-'0';
  if(i>j)*a[T(i)+j]=v*(1-2*m);
 }
 I p[n+1];memset(p,0,sizeof(p));*p=1;
 printf("%d\n",(1-2*(n&1))*h(2*n,a,p));
 R 0;
}

Algorithm:

The matrix, which is symmetric, is stored in lower-left triangular form. Triangle indices i,j correspond to linear index T(max(i,j))+min(i,j) where T is a macro for i*(i-1)/2. Matrix elements are polynomials of degree n. A polynomial is represented as an array of coefficients ordered from the constant term (p[0]) to xn's coefficient (p[n]). The initial -1,0,1 matrix values are first converted to const polynomials.

We perform a recursive step with two arguments: the half-matrix (i.e. triangle) a of polynomials and a separate polynomial p (referred to as beta in the paper). We reduce the size-m problem (initially m=2*n) recursively to two problems of size m-2 and return the difference of their hafnians. One of them is to use the same a without its last two rows, and the very same p. Another is to use the triangle b[i][j] = a[i][j] + shmul(a[m-1][i],a[m-2][j]) + shmul(a[m-1][j],a[m-2][i]) (where shmul is the shift-multiply operation on polynomials - it's like polynomial product as usual, additionally multiplied by the variable "x"; powers higher than x^n are ignored), and the separate polynomial q = p + shmul(p,a[m-1][m-2]). When recursion hits a size-0 a, we return the major coefficient of p: p[n].

The shift-and-multiply operation is implemented in function f(x,y,z). It modifies z in-place. Loosely speaking, it does z += shmul(x,y). This seems to be the most performance-critical part.

After the recursion has finished, we need to fix the sign of the result by multiplying by (-1)n.

ngn

Posted 2018-03-02T15:16:01.223

Reputation: 11 449

Could you show an explicit example of the input your code accepts? Say for a 2 by 2 matrix. Also, you seem to have golfed your code! (This is a fastest-code challenge, not a golf challenge.) – None – 2018-03-03T13:48:53.557

@Lembik For the record, as I said in chat, the input is in the same format as the examples - json (actually, it only reads the numbers and uses n=sqrt(len(input))/2). I usually write short code, even when golfing is not a requirement. – ngn – 2018-03-03T14:46:11.017

What's the largest size matrix this new code should support? – None – 2018-03-05T10:00:56.830

@Lembik the only hard-coded limit is the size of the input buffer, in which I left plenty of room for 100x100 matrix (n=50), so it should be practically unlimited – ngn – 2018-03-05T10:15:05.667

1-march=native will make a big difference here. At least on TIO, it nearly cuts the wall time in half. – Dennis – 2018-03-05T14:40:42.950

1Also, at least on TIO, the executable produced by gcc will be even faster. – Dennis – 2018-03-05T14:52:09.597

@Dennis I used -march=native when testing. – None – 2018-03-07T10:08:51.440

3

Python

This is pretty much a straight-forward, reference implementation of Algorithm 2 from the mentioned paper. The only changes were to only keep the current value of B, dropping the values of β by only updating g when iX, and truncated polynomial multiplication by only calculating the values up to degree n.

from itertools import chain,combinations

def powerset(s):
    return chain.from_iterable(combinations(s, k) for k in range(len(s)+1))

def padd(a, b):
    return [a[i]+b[i] for i in range(len(a))]

def pmul(a, b):
    n = len(a)
    c = [0]*n
    for i in range(n):
        for j in range(n):
            if i+j < n:
                c[i+j] += a[i]*b[j]
    return c

def hafnian(m):
    n = len(m) / 2
    z = [[[c]+[0]*n for c in r] for r in m]
    h = 0
    for x in powerset(range(1, n+1)):
        b = z
        g = [1] + [0]*n
        for i in range(1, n+1):
            if i in x:
                g = pmul(g, [1] + b[0][1][:n])
                b = [[padd(b[j+2][k+2], [0] + padd(pmul(b[0][j+2], b[1][k+2]), pmul(b[0][k+2], b[1][j+2]))[:n]) if j != k else 0 for k in range(2*n-2*i)] for j in range(2*n-2*i)]
            else:
                b = [r[2:] for r in b[2:]]
        h += (-1)**(n - len(x)) * g[n]
    return h

Try it online!

Here is a faster version with some of the easy optimizations.

def hafnian(m):
  n = len(m)/2
  z = [[0]*(n+1) for _ in range(n*(2*n-1))]
  for j in range(1, 2*n):
    for k in range(j):
      z[j*(j-1)/2+k][0] = m[j][k]
  return solve(z, 2*n, 1, [1] + [0]*n, n)

def solve(b, s, w, g, n):
  if s == 0:
    return w*g[n]
  c = [b[(j+1)*(j+2)/2+k+2][:] for j in range(1, s-2) for k in range(j)]
  h = solve(c, s-2, -w, g, n)
  e = g[:]
  for u in range(n):
    for v in range(n-u):
      e[u+v+1] += g[u]*b[0][v]
  for j in range(1, s-2):
    for k in range(j):
      for u in range(n):
        for v in range(n-u):
          c[j*(j-1)/2+k][u+v+1] += b[(j+1)*(j+2)/2][u]*b[(k+1)*(k+2)/2+1][v] + b[(k+1)*(k+2)/2][u]*b[(j+1)*(j+2)/2+1][v]
  return h + solve(c, s-2, w, e, n)

Try it online!

For added fun, here is a reference implementation in J.

miles

Posted 2018-03-02T15:16:01.223

Reputation: 15 654

This is pretty slow from all the list comprehensions and from computing equivalent values across the diagonal, so there's no need to benchmark this. – miles – 2018-03-07T06:11:06.680

Pretty awesome! – None – 2018-03-07T07:22:45.630

Very nice! I tried a similar thing with sympy which was shockingly slow and, while being correct for the small examples, returned - after a long time - a wrong result for the 24*24 matrix. I have no idea what's going on there. - By the description above Algorithm 2, the polynomial multiplication there is already meant to be truncated. – Christian Sievers – 2018-03-07T08:57:34.513

This code is actually strangely slow. Even for n = 20 it takes 22 seconds compared to essentially instant for ngn's python code. Are you sure it isn't doing too much work somewhere? – None – 2018-03-07T10:12:53.073

2In pmul, use for j in range(n-i): and avoid the if – Christian Sievers – 2018-03-07T12:02:20.903

It also helps to replace range (returns a list) with xrange (returns a generator), as its arguments are rather large. – Dennis – 2018-03-07T12:44:22.197

Just squeezes n = 28 with those improvement + pypy. – None – 2018-03-07T14:14:24.077

I still don't understand why this is so much slower than the other implementations of the same algorithm. Any ideas? – None – 2018-03-12T09:31:32.953

1@Lembik It computes the whole matrix; for a factor of about two replace j != k with j < k. It copies a submatrix in the else case which can be avoided when we handle and delete the last two instead of the first two rows and columns. And when it computes with x={1,2,4} and later with x={1,2,4,6} then it repeats its computations up to i=5. I replaced the structure of the two outer loops with first looping on i and then recursively assuming i in X and i not in X. - BTW, It might be interesting to look at the growth of the time needed compared to the other slower programs. – Christian Sievers – 2018-03-12T16:48:27.237

@Lembik I improved it a bit with some of the easy optimizations. You can cut the calculations in half by only using the values in either the upper or lower part across the diagonal since each submatrix will be symmetric. Using the lower triangle makes indexing easy since each row will start at a triangle number. As already implemented by others, recursion instead of iterating over 2^n binary numbers will keep track of previous states at each bit. And instead of counting the zeros in each bit, i just keep track of the sign and negate it accordingly at each recursive call. – miles – 2018-03-13T00:01:57.667

That's a huge improvement! – None – 2018-03-13T08:21:54.747

What are the dimension of c? I can see that c is a list of lists and there are sum_{i=1}^{s-2} i = (s-2)(s-1)/2 lists in it I think. But how long is each list? – Anush – 2018-04-24T20:44:05.810

Actually, would you be able (please) to write out c = [b[(j+1)*(j+2)/2+k+2][:] for j in range(1, s-2) for k in range(j)] as nested for loops? I am trying to convert your code to cython and am stuck on that line. – Anush – 2018-04-25T08:52:45.667

@Anush c is the concatenation of the rows of the lower half of a matrix, hence the triangle number of entries. Each entry represents a polynomial of degree at most n, so they are lists of length n+1. The code line should be equivalent to something like c=[]; for j in...: for k in...: c+= b[...][:] (no idea if that's the best way...) – Christian Sievers – 2018-04-27T13:37:16.737

@Anush Oh that was wrong, it should be c+=[b[...][:]] or maybe better c.append(b[...][:]) – Christian Sievers – 2018-04-27T16:34:16.463

@ChristianSievers. Thank you. One issue is that c is sometimes the empty list. That is not a list of lists at all. – Anush – 2018-04-27T16:35:47.333

@Anush Yes, when s=2, one step before the end of the recursion, c is a list of zero lists. That is as expected. – Christian Sievers – 2018-04-27T16:49:55.743

@Anush Christian explained it all. – miles – 2018-04-28T21:10:36.537

1

Octave

This is basically a copy of Dennis' entry, but optimized for Octave. The main optimization is done by using the full input matrix (and its transpose) and recursion using only matrix indices, rather than creating reduced matrices.

The main advantage is reduced copying of matrices. While Octave does not have a difference between pointers/references and values and functionally only does pass-by-value, it's a different story behind the scenes. There, copy-on-write (lazy copy) is used. That means, for the code a=1;b=a;b=b+1, the variable b is only copied to a new location at the last statement, when it is changed. Since matin and matransp are never changed, they will never by copied. The disadvantage is that the function spends more time calculating the correct indices. I may have to try different variations between numerical and logical indices to optimize this.

Important note: input matrix should be int32! Save the function in a file called haf.m

function h=haf(matin,indices,matransp,transp)

    if nargin-4
        indices=int32(1:length(matin));
        matransp=matin';
        transp=false;
    end
    if(transp)
        matrix=matransp;
    else
        matrix=matin;
    end
    ind1=indices(1);
    n=length(indices);
    if n==2
        h=matrix(ind1,indices(2));
        return
    end
    h=0*matrix(1); 
    for j=1:(n-1)
        indj=indices(j+1);
        k=matrix(ind1,indj);
        if logical(k)
            indicestemp=true(n,1);
            indicestemp(1:j:j+1)=false;
            h=h+k.*haf(matin,indices(indicestemp),matransp,~transp);
        end
    end
end

Example test script:

matrix = int32([0 0 1 -1 1 0 -1 -1 -1 0 -1 1 0 1 1 0 0 1 0 0 1 0 1 1;0 0 1 0 0 -1 -1 -1 -1 0 1 1 1 1 0 -1 -1 0 0 1 1 -1 0 0;-1 -1 0 1 0 1 -1 1 -1 1 0 0 1 -1 0 0 0 -1 0 -1 1 0 0 0;1 0 -1 0 1 1 0 1 1 0 0 0 1 0 0 0 1 -1 -1 -1 -1 1 0 -1;-1 0 0 -1 0 0 1 -1 0 1 -1 -1 -1 1 1 0 1 1 1 0 -1 1 -1 -1;0 1 -1 -1 0 0 1 -1 -1 -1 0 -1 1 0 0 0 -1 0 0 1 0 0 0 -1;1 1 1 0 -1 -1 0 -1 -1 0 1 1 -1 0 1 -1 0 0 1 -1 0 0 0 -1;1 1 -1 -1 1 1 1 0 0 1 0 1 0 0 0 0 1 0 1 0 -1 1 0 0;1 1 1 -1 0 1 1 0 0 -1 1 -1 1 1 1 0 -1 -1 -1 -1 0 1 1 -1;0 0 -1 0 -1 1 0 -1 1 0 1 0 0 0 0 0 1 -1 0 0 0 1 -1 -1;1 -1 0 0 1 0 -1 0 -1 -1 0 0 1 0 0 -1 0 -1 -1 -1 -1 -1 1 -1;-1 -1 0 0 1 1 -1 -1 1 0 0 0 -1 0 0 -1 0 -1 -1 0 1 -1 0 0;0 -1 -1 -1 1 -1 1 0 -1 0 -1 1 0 1 -1 -1 1 -1 1 0 1 -1 1 -1;-1 -1 1 0 -1 0 0 0 -1 0 0 0 -1 0 0 -1 1 -1 -1 0 1 0 -1 -1;-1 0 0 0 -1 0 -1 0 -1 0 0 0 1 0 0 1 1 1 1 -1 -1 0 -1 -1;0 1 0 0 0 0 1 0 0 0 1 1 1 1 -1 0 0 1 -1 -1 -1 0 -1 -1;0 1 0 -1 -1 1 0 -1 1 -1 0 0 -1 -1 -1 0 0 -1 1 0 0 -1 -1 1;-1 0 1 1 -1 0 0 0 1 1 1 1 1 1 -1 -1 1 0 1 1 -1 -1 -1 1;0 0 0 1 -1 0 -1 -1 1 0 1 1 -1 1 -1 1 -1 -1 0 1 1 0 0 -1;0 -1 1 1 0 -1 1 0 1 0 1 0 0 0 1 1 0 -1 -1 0 0 0 1 0;-1 -1 -1 1 1 0 0 1 0 0 1 -1 -1 -1 1 1 0 1 -1 0 0 0 0 0;0 1 0 -1 -1 0 0 -1 -1 -1 1 1 1 0 0 0 1 1 0 0 0 0 1 0;-1 0 0 0 1 0 0 0 -1 1 -1 0 -1 1 1 1 1 1 0 -1 0 -1 0 1;-1 0 0 1 1 1 1 0 1 1 1 0 1 1 1 1 -1 -1 1 0 0 0 -1 0])

tic
i=1;
while(toc<60)
    tic
    haf(matrix(1:i,1:i));
    i=i+1;
end

I have tried this out using TIO and MATLAB (I've actually never installed Octave). I imagine getting it to work is as simple as sudo apt-get install octave. The command octave will load the Octave GUI. If it's any more complicated than this, I will delete this answer until I've provided more detailed installation instructions.

Sanchises

Posted 2018-03-02T15:16:01.223

Reputation: 8 530

0

Recently Andreas Bjorklund, Brajesh Gupt and myself published a new algorithm for Hafnians of complex matrices: https://arxiv.org/pdf/1805.12498.pdf . For an n \times n matrix it scales like n^3 2^{n/2}.

If I understand correctly Andreas' original algorithm from https://arxiv.org/pdf/1107.4466.pdf it scales like n^4 2^{n/2} or n^3 log(n) 2^{n/2} if you used Fourier transforms to do polynomial multiplications.
Our algorithm is specifically taylored for complex matrices so it will not be as fast as the ones developed here for {-1,0,1} matrices. I wonder however if one can use some of the tricks you used to improve our implementation? Also if people are interested I would like to see how their implementation do when given complex numbers instead of integers. Finally, any comments, criticism, improvements, bugs, improvements are welcomed in our repository https://github.com/XanaduAI/hafnian/

Cheers!

Nicolás Quesada

Posted 2018-03-02T15:16:01.223

Reputation: 101

Welcome to the site! However answers to this question ought to contain code. This would be better left as a comment, (Which unfortunately you do not have the rep to make). – Post Rock Garf Hunter – 2018-06-04T16:12:49.777

Welcome to PPCG. While your answer may make a fine comment, the site is not for QA. This is a challenge site and answer to a challenge must have code and not an explanation of something else. Kindly update or delete (If you won't, mods will) – Muhammad Salman – 2018-06-04T16:34:12.147

Well, the code is on github, but I guess it makes nonsense to just copy-paste it here. – Nicolás Quesada – 2018-06-04T16:38:00.297

2If it fits in an answer, especially if you're one of the authors, I don't think there's anything wrong with posting a properly attributed, competitive solution that had been published elsewhere. – Dennis – 2018-06-04T17:06:40.573

@NicolásQuesada Answers on this site should be self-contained if possible, meaning we shouldn't have to go to another site to view your answer/code. – mbomb007 – 2018-06-04T17:52:50.123

I would love it to be contained but I don't think you can write a self contained routine that also does eigenvalue decomposition. Anyway my intention was not to annoy anyone but just to know how well one can do Complex hafnians. If this is bothering anyone feel free to delete my comment/reply. – Nicolás Quesada – 2018-06-04T17:57:54.353

@Dennis There is one problem though... There's nobody to benchmark the code anymore, as the OP has left the site :( – Kirill L. – 2018-06-04T18:30:49.777