Task

Given a Boolean matrix, find one (or optionally more) subset(s) of rows which have exactly one True in each column. You may use any algorithm, but must support very large matrices, like the last example.

One possible algorithm (Knuth's Algorithm X)

While there is no requirement to use this algorithm, it may be your best option.

If the matrix A has no columns, the current partial solution is a valid solution; terminate successfully.
Otherwise choose a column c.
Choose* a row r such that A_{r, c} = 1.
Include row r in the partial solution.
For each column j such that A_{r, j} = 1,
for each row i such that A_{i, j} = 1,
delete row i from matrix A.
delete column j from matrix A.
Repeat this algorithm recursively on the reduced matrix A.

* Step 3 is non-deterministic, and needs to be backtracked to in the case of a failure to find a row in a later invocation of step 3.

Input

Any desired representation of the minimum 2×2 matrix A, e.g. as a numeric or Boolean array

1 0 0 1 0 0 1
1 0 0 1 0 0 0
0 0 0 1 1 0 1
0 0 1 0 1 1 0
0 1 1 0 0 1 1
0 1 0 0 0 0 1

or as Universe + Set collection

U = {1, 2, 3, 4, 5, 6, 7}
S = {
    A = [1, 4, 7],
    B = [1, 4],
    C = [4, 5, 7],
    D = [3, 5, 6],
    E = [2, 3, 6, 7],
    F = [2, 7]
    }

or as 0 or 1 indexed sets; {{1, 4, 7}, {1, 4}, {4, 5, 7}, {3, 5, 6}, {2, 3, 6, 7}, {2, 7}}.

Output

Any desired representation of one (or optionally more/all) of the solutions, e.g. as numeric or Boolean array of the selected rows

1 0 0 1 0 0 0
0 0 1 0 1 1 0
0 1 0 0 0 0 1

or as Boolean list indicating selected rows {0, 1, 0, 1, 0, 1} or as numeric (0 or 1 indexed) list of selected rows {2, 4, 6} or as list of set names ['B', 'D', 'F'].

More examples

In:

Out: 1 3 or 4 or 1 0 1 0 or 0 0 0 1 or [[1,3],[4] etc.

In:

Out: 1 1 0 0 etc.

In:

0 1 0 1 1 0 1
1 1 0 0 1 1 1
0 1 0 0 1 0 0
1 1 1 0 0 0 1
0 0 0 1 1 1 0

Out: 0 0 0 1 1 etc.

In:

0 1 1
1 1 0

Out: Nothing or error or faulty solution, i.e. you do not have to handle inputs with no solution.

In: http://pastebin.com/raw/3GAup0fr

Out: 0 10 18 28 32 38 48 61 62 63 68 86 90 97 103 114 120 136 148 157 162 174 177 185 186 194 209 210 218 221 228 243 252 255 263 270 271 272 273 280 291 294 295 309 310 320 323 327 339 345 350 353 355 367 372 373 375 377 382 385 386 389 397 411 417 418 431 433 441 451 457 458 459 466 473 479 488 491 498 514 517

In: https://gist.github.com/angs/e24ac11a7d7c63d267a2279d416bc694

Out: 553 2162 2710 5460 7027 9534 10901 12281 12855 13590 14489 16883 19026 19592 19834 22578 25565 27230 28356 29148 29708 30818 31044 34016 34604 36806 36918 39178 43329 43562 45246 46307 47128 47906 48792 50615 51709 53911 55523 57423 59915 61293 62087 62956 64322 65094 65419 68076 70212 70845 71384 74615 76508 78688 79469 80067 81954 82255 84412 85227

Adám

Posted 2016-11-24T10:36:50.320

Reputation: 37 779

You need to point out that step 3 is nondeterministic, and needs to be backtracked do in the case of a failure to find a row in a later invocation of step 3. Without that, the algorithm won't reliably work. – None – 2016-11-24T11:05:03.227

If we could go through every single one of the algorithms from Knuth's books, one by one, that would make an awesome collection! – None – 2016-11-24T11:43:46.117

@LuisMendo Done. – Adám – 2016-11-24T13:10:43.323

Isn't your first test case wrong? Shouldn't it be {2, 4, 6} if returned as row indices? – Billywob – 2016-11-24T13:11:35.343

@Billywob Yes, typo. Thanks. – Adám – 2016-11-24T13:12:24.680

Can we assume that there are at least two rows of the matrix? – Billywob – 2016-11-24T13:33:15.180

@Billywob Yes, added. – Adám – 2016-11-24T13:41:56.927

4"Only solutions using this algorithm are eligible to win" but what exactly counts as "this algorithm"? How literally is it necessary to take "delete row" and "delete column"? And does the algorithm have to use the heuristic which is an key part of Knuth's presentation of the algorithm but which isn't mentioned at all in your description? – Peter Taylor – 2016-11-24T14:38:06.933

6It might be better to make a question which only asks for exact set cover but which has some hefty test cases which can't be handled by naïve brute force but can be handled by Knuth's algorithm. – Peter Taylor – 2016-11-24T14:40:26.130

1Relevant to Peter's comments. – Martin Ender – 2016-11-24T15:05:46.647

@PeterTaylor Very good point. If you edit in one or more such test cases, I will change the rules. – Adám – 2016-11-24T16:52:55.300

An actual Sudoku is too big to fit in the post: http://pastebin.com/raw/3GAup0fr

– Peter Taylor – 2016-11-24T23:21:49.180

@PeterTaylor Added. Thank you. – Adám – 2016-11-25T12:52:38.203

1All algorithms are now equally allowed. – Adám – 2016-11-25T13:05:57.390

1"must support very large matrices" is quite ambiguous, especially since Knuth's algorithm can't handle the large test case without the column selection heuristic. Maybe have this question as pure [tag:code-golf] and have another as [tag:fastest-code]? – Angs – 2016-11-25T15:36:23.930

Here's an even larger problem in case anyone's interested – pentomino cover presented as an exact cover problem. – Angs – 2016-11-26T08:07:11.347

@Angs Thanks. Added. Run less than 5 seconds on my old slow laptop. – Adám – 2016-11-27T07:26:06.840

Answers

Haskell, 100 93 92 87 83 80 bytes

Knuth's algorithm:

g&c=filter(g.any(`elem`c))
(u:v)%a=[c:s|c<-id&u$a,s<-(not&c)v%(not&c$a)]
x%_=[x]

Calculates all covers nondeterministically depth-first in the list monad. Use head to calculate only one since Haskell is lazy. Usage:

*Main> [[1],[2],[3],[4],[5],[6],[7]]%[[1, 4, 7], [1, 4], [4, 5, 7], [3, 5, 6], [2, 3, 6, 7], [2, 7]]
[[[1,4],[2,7],[3,5,6]]]

For more speed by using Knuth's suggestion of selecting column with least ones, use this: (115 bytes, flat list for universe). Finds first solution to the large pentomino cover problem in under a minute when compiled.

import Data.List
[]%_=[[]]
w%a=[c:s|c<-a,c\\(head.sortOn length.group.sort$w:a>>=id)/=c,s<-(w\\c)%[b|b<-a,c\\b==c]]

Thanks to @Zgarb for saving 1+3 bytes!

Thanks to @ChristianSievers for his sage advice and saving 5 bytes and some.

Ungolfed (with flat list universe):

knuthX [] _ = [[]]
knuthX (u:niverse) availableRows = --u chosen deterministically
  [ chosen:solution
  | let rows = filter (elem u) availableRows
  , chosen <- rows  --row chosen nondeterministically in list monad
  , solution <- knuthX
                  (filter (not.(`elem`chosen)) niverse)
                  (filter (not.any(`elem`chosen)) availableRows)
  ]

Angs

Posted 2016-11-24T10:36:50.320

Reputation: 4 825

Might be worth adding some explanation on how it works for people unfamiliar with Haskell. You're using the List monad in order to handle the nondeterminism, I think? – None – 2016-11-24T13:13:01.560

You can save 3 bytes by changing the filter into a list comprehension and defining an auxiliary function h!x=h(\elem`(x>>=id))`. – Zgarb – 2016-11-24T13:52:35.810

I think it is more Algorithm X (and more efficient) if you really do step 2: take (and remove) the first element of the universe; and then select rows that contain it. That also simplifies the base case to the form in the description. – Christian Sievers – 2016-11-24T15:49:48.767

Alg. X can select a column and realize there is no corresponding row containing it, so it can abort this branch of the backtracking tree. Your alg. will continue trying combinations of rows in this case. I don't think it amounts to the same work. - I think null rows can be combined with the general case. You probably shouldn't pattern match anyway, but get the candidate lines by filtering. And I think you can do without accumulator. – Christian Sievers – 2016-11-24T17:30:07.587

Great to see it didn't get longer! You can also do (!)=elem, and swap the two % lines and replace []%_ by _%_. Can't f be used in the computation of the partly collapsed universe? – Christian Sievers – 2016-11-24T21:48:09.720

1@ChristianSievers I'm running into the monomorphism restriction with (!)=elem, hence the a's. And yes, f can definitely be used there. Thanks! The different combinations of filter … elem beg to be unified… – Angs – 2016-11-24T22:02:39.473

I claim this code does what is demanded, but I tried the big example from the comments and it is slooowww... The running time changes from not finished yet to almost immediate when we don't use the first element of the universe, but the rarest. For universe univ in this strange list of singletons form and rows a my ungolfed proof-of-concept uses [snd.head.sort.map(\l->(length l,head l)).group.sort.concat$univ++a]. – Christian Sievers – 2016-11-25T12:13:26.680

Add some $s to get rid of parentheses: filter$g.any(\elem`c),id&u$aandnot&u$a. Also, the last line can bex%_=[x]`. – Zgarb – 2016-11-25T14:22:47.717

You find the column with the least positive number of ones, but miss the ones with no one. That's why my version appended the universe. - We can use nub to get from a list with identical entries to a singleton list with that entry and drop the outer []. – Christian Sievers – 2016-11-25T15:17:09.350

@ChristianSievers the outer [] is because I use <- instead of let (singleton universe), can't take it out. Otherwise good points. I'm not sure if the fast alternative should be optimized for golfing or speed as it's fast enough as it is (without u++). – Angs – 2016-11-25T15:28:21.213

Oh, those were not the same [] as in my version, the take 1 already produced a singleton. I somehow read it as head that also handles the case of empty lists, which could only be replaced by nub after the u++ change. I don't see why it works in this form (oh maybe laziness: when generating c, u is not needed if a is empty), but I agree it's fast enough. I was afraid the changes for the fast version would more than double the size, nice job to keep it small! – Christian Sievers – 2016-11-25T16:08:58.273

1We can go back to flat universe, use the version that should generally be faster but makes no difference in the big example, and save some bytes by using w%a=[c:s|(u:_)<-[sortOn length.group.sort$w++concat a],c<-id&u$a,s<-(w\\c)%(not&c$a)]. Note that now u is a list that may contain the chosen column repeatedly, but that doesn't matter for correctness. – Christian Sievers – 2016-11-26T03:11:00.237

And it should be possible to inline u (then we need head again, or !!0) – Christian Sievers – 2016-11-26T03:56:28.610

@ChristianSievers clever! Found the way to take only +2 bytes for "optimal" speed, so it'll stay. I managed to save a few bytes in the fast version by removing (&) altogether. – Angs – 2016-11-26T06:10:24.163

Wow, that looks like it might be inefficient, but the compiler gets it! There is no need for the inner d list, just do c<-a,c\\(...)/=c,s<-... – Christian Sievers – 2016-11-26T18:58:36.523

1@ChristianSievers hooray, less than +50% length from slow to fast! There was a slight regression in speed when u was inlined since it's calculated once per element in a , but we're aiming for golfed speed, not optimal speed. c\\b==c probably isn't that bad since it can stop lazily. Not inlining u and using Data.Map.Strict for finding the rarest element would be on a totally different level. – Angs – 2016-11-26T19:45:01.670

Python, 482 bytes

r=lambda X,Y:u({j:set(filter(lambda i:j in Y[i],Y))for j in X},Y)
def u(X,Y,S=[]):
 if X:
  for r in list(X[min(X,key=lambda c:len(X[c]))]):
   S.append(r);C=v(X,Y,r)
   for s in u(X,Y,S):yield s
   w(X,Y,r,C);S.pop()
 else:yield list(S)
def v(X,Y,r):
 C=[]
 for j in Y[r]:
  for i in X[j]:
   for k in Y[i]:
    if k!=j:X[k].remove(i)
  C.append(X.pop(j))
 return C
def w(X,Y,r,C):
 for j in reversed(Y[r]):
  X[j]=C.pop()
  for i in X[j]:
   for k in Y[i]:
    if k!=j:X[k].add(i)

r is the function to call with the Universe + Set collection.

Golfed a little from this page

Usage:

X = {1, 2, 3, 4, 5, 6, 7}
Y = {
    'A': [1, 4, 7],
    'B': [1, 4],
    'C': [4, 5, 7],
    'D': [3, 5, 6],
    'E': [2, 3, 6, 7],
    'F': [2, 7]}

for a in r(X,Y): print a

Karl Napf

Posted 2016-11-24T10:36:50.320

Reputation: 4 131

You should be able to remove the cast to list in the first for loop and the yield, and change reversed(...) to ...[::-1] – Blue – 2016-11-24T14:01:10.967

Every time you use S.append(x) you can do S+=x, (with the trailing comma): for example C+=X.pop(j),. – FlipTack – 2016-11-26T13:42:26.500

R, 124 117 115 113 bytes

Very inefficient, but not that long in code. Tries all possible subsets of rows and checks if all the rows sum to 1.

f=function(x,n=nrow(x))for(i in 1:n)for(j in 1:ncol(z<-combn(n,i)))if(all(colSums(x[k<-z[,j],,drop=F])==1))cat(k)

Takes a matrix as input. Returns the rownumbers of the first solution encountered. Returns nothing if there are no solutions, but might take a long while terminating if the input is big.

Ungolfed:

f=function(x, n=nrow(x)){


    for(i in 1:n){
        z=combn(n,i)

        for(j in 1:ncol(z)){
            k=x[z[,j],,drop=F]

            if(all(colSums(k)==1)){
                print(z[,j])
            }
        }
    }
}

7 bytes saved thanks to Billywob

Another 2 bytes saved thanks to Billywob

JAD

Posted 2016-11-24T10:36:50.320

Reputation: 2 898

Thanks, I didn't know you could assign variables inline like that. Also, if you drop the drop=F statement, it doesn't work for subsets of only one row. I never worked with scan before, and just assumed it would work like that, my bad. Changing it to a named function. – JAD – 2016-11-25T15:19:35.477

Also changed output to returning the indices of the solution, saving 2 more bytes. – JAD – 2016-11-25T15:24:50.680

In general you don't need to use named functions (unless they are nested). Also, if you assign the row counter as an implicit argument to the function, you can again skip the curly brackets: function(x,n=nrow(x))for(i in 1:n)for(j in 1:ncol(z<-combn(n,i)))if(all(colSums(x[k<-z[,j],,drop=F])==1))cat(k) – Billywob – 2016-11-25T15:28:17.263

Ah right, I thought about doing that with n but somehow it slipped my mind. Thanks again :) – JAD – 2016-11-25T15:29:24.933

APL (Dyalog), 81 bytes

{×≢⍉⍵:⍵∇{~1∊⍵:0⋄0≡s←⍺⍺c/⍺⌿⍨r←~∨/⍺/⍨~c←~,⍺⌿⍨f←<\⍵:⍺∇f<⍵⋄f∨r\s},⍵/⍨<\n=⌊/n←+⌿⍵⋄∧/⍵}

Try it online!

{ anonymous function:

: if…

⍵ If the argument

⍉ when transposed

≢ has a row-count

× which is positive

then

⍵∇{… then apply this function with the right argument as left argument (constraint matrix), but only after it has been modified by the following anonymous operator (higher-order function):
: if…
1∊⍵ there are ones in the right argument
~ NOT
then…
0 return zero (i.e. fail)
⋄ else:
: if…
<\⍵ the first true of the right argument (lit. cumulative less-than; first row)
f← assign that to f
⍺⌿⍨ use that to filter the rows of the left argument
, ravel (flatten)
~ negate
c← assign that to c
~ negate
⍺/⍨ use that to filter the columns of the left argument
∨/ which rows have a true? (OR reduction)
~ negate that
⍺⌿⍨ use that to filter the rows of the left argument
c/ use c to filter the columns of that
⍺⍺ apply the original outer function (the left operand; sub-matrix covers)
s← assign that to s
0≡ …is identical to zero (failure), then (try a different row):
f<⍵ right-argument AND NOT f
⍺∇ recurse on that (preserving the original left argument)
⋄ else:
r\s use zeros in r to insert zero-filled columns in s
f∨ return f OR that (success; row f included)
}… … on

+⌿⍵ the sum of the argument

n← assign that to n

⌊/ the minimum of that

n= Boolean where n is equal to that

<\ the first one of those (lit. cumulative less-than)

⍵/⍨ use that to filter the columns of the argument (gives first column with fewest ones)

, ravel that (flatten)

⋄ else

∧/⍵ rows which are all ones (none, so this gives a zero for each row)

} end of anonymous function

Adám

Posted 2016-11-24T10:36:50.320

Reputation: 37 779

Find the rows which make each column have one True (was: Knuth's Algorithm X)

Task

One possible algorithm (Knuth's Algorithm X)

Input

Output

More examples

Answers

Haskell, 100 93 92 87 83 80 bytes

Python, 482 bytes

R, 124 117 115 113 bytes

APL (Dyalog), 81 bytes