Greedily partition the list of combinations with repetition

10

First, a few definitions:

  • Given n and k, consider the sorted list of multisets, where for each multiset we choose k numbers from {0, 1, ..., n-1} with repetitions.

For example, for n=5 and k=3, we have:

[(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4), (0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 2), (0, 2, 3), (0, 2, 4), (0, 3, 3), (0, 3, 4), (0, 4, 4), (1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4), (1, 2, 2), (1, 2, 3), (1, 2, 4), (1, 3, 3), (1, 3, 4), (1, 4, 4), (2, 2, 2), (2, 2, 3), (2, 2, 4), (2, 3, 3), (2, 3, 4), (2, 4, 4), (3, 3, 3), (3, 3, 4), (3, 4, 4), (4, 4, 4)]

  • A part is a list of multisets with the property that the size of the intersection of all multisets in the part is at least k-1. That is we take all the multisets and intersect them (using multiset intersection) all at once. As examples, [(1, 2, 2), (1, 2, 3), (1, 2, 4)] is a part as its intersection is of size 2, but [(1, 1, 3),(1, 2, 3),(1, 2, 4)] is not, because its intersection is of size 1.

Task

Your code should take two arguments n and k. It should then greedily go through these multisets in sorted order and output the parts of the list. For the case n=5, k=3, the correct partitioning is:

(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4)
(0, 1, 1), (0, 1, 2), (0, 1, 3), (0, 1, 4)
(0, 2, 2), (0, 2, 3), (0, 2, 4)
(0, 3, 3), (0, 3, 4)
(0, 4, 4)
(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 1, 4)
(1, 2, 2), (1, 2, 3), (1, 2, 4)
(1, 3, 3), (1, 3, 4)
(1, 4, 4)
(2, 2, 2), (2, 2, 3), (2, 2, 4)
(2, 3, 3), (2, 3, 4)
(2, 4, 4)
(3, 3, 3), (3, 3, 4)
(3, 4, 4), (4, 4, 4)

Here is another example for n = 4, k = 4.

(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 0, 2), (0, 0, 0, 3)
(0, 0, 1, 1), (0, 0, 1, 2), (0, 0, 1, 3)
(0, 0, 2, 2), (0, 0, 2, 3)
(0, 0, 3, 3)
(0, 1, 1, 1), (0, 1, 1, 2), (0, 1, 1, 3)
(0, 1, 2, 2), (0, 1, 2, 3)
(0, 1, 3, 3)
(0, 2, 2, 2), (0, 2, 2, 3)
(0, 2, 3, 3), (0, 3, 3, 3)
(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 3)
(1, 1, 2, 2), (1, 1, 2, 3)
(1, 1, 3, 3)
(1, 2, 2, 2), (1, 2, 2, 3)
(1, 2, 3, 3), (1, 3, 3, 3)
(2, 2, 2, 2), (2, 2, 2, 3)
(2, 2, 3, 3), (2, 3, 3, 3)
(3, 3, 3, 3)

Clarification of what greedy means: For each multiset in turn we look to see if it can be added to the existing part. If it can we add it. If it can't we start a new part. We look at the multisets in sorted order as in the example given above.

Output

You can output the partitioning in any sensible format you like. However, multisets should be written horizontally on one line. That is an individual multiset should not be written vertically or spread over several lines. You can choose how you separate the representation of parts in the output.

Assumptions

We can assume that n >= k > 0.

user9206

Posted 2017-04-16T10:03:58.733

Reputation:

@LuisMendo I just made a mistake. I meant multisets should be written horizontally on one line. – None – 2017-04-16T17:10:56.757

In the first test case, why is (0, 4, 4) by itself? Given your description, I would think its "part" would be (0, 4, 4), (1, 4, 4), (2, 4, 4), (3, 4, 4), (4, 4, 4). Similarly for (0, 0, 3, 3) in the second test case. – Greg Martin – 2017-04-16T19:00:02.540

@GregMartin Because of the greediness of the method. You are right that it will in general be suboptimal. The minimal number of parts you can get by a non greedy method is an interesting if hard question, – None – 2017-04-16T19:05:10.990

Oh, you literally mean that once the very next term doesn't match the "active" part, then that part is closed forever. Ok. – Greg Martin – 2017-04-16T19:46:00.887

Answers

4

Jelly, 26 25 bytes

œ&µL‘<⁴ȧ⁹ȯ
œċµç\L€=⁴œṗµḊ’

Full program which prints a representation of a list of lists, each list being a part, e.g. for n=5, k=3:

[[[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 0, 3], [0, 0, 4]], [[0, 1, 1], [0, 1, 2], [0, 1, 3], [0, 1, 4]], [[0, 2, 2], [0, 2, 3], [0, 2, 4]], [[0, 3, 3], [0, 3, 4]], [0, 4, 4], [[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 1, 4]], [[1, 2, 2], [1, 2, 3], [1, 2, 4]], [[1, 3, 3], [1, 3, 4]], [1, 4, 4], [[2, 2, 2], [2, 2, 3], [2, 2, 4]], [[2, 3, 3], [2, 3, 4]], [2, 4, 4], [[3, 3, 3], [3, 3, 4]], [[3, 4, 4], [4, 4, 4]]]

Note: the representation used removes the redundant [ and ] around lists of length 1.

Try it online! or see a pretty print version (cost 3 bytes)

How?

œ&µL‘<⁴ȧ⁹ȯ - Link 1, conditional multi-set intersection: list x, list y
œ&         - multi-set intersection(x, y)
  µ        - monadic chain separation (call that i)
   L       - length(i)
    ‘      - increment
     <     - less than?:
      ⁴    -     2nd program input, k
       ȧ   - logical and with:
        ⁹  -     link's right argument, y (y if i is too short, else 0)
         ȯ - logical or (y if i is too short, else i)

œċµç\L€=⁴œṗµḊ’ - Main link: n, k
œċ             - combinations with replacement(n, k) (sorted since n implies [1,n])
  µ            - monadic chain separation (call that w)
         œṗ    - partition w at truthy indexes of:
   ç\          -     reduce w with last link (1) as a dyad
     L€        -     length of €ach
        ⁴      -     2nd program input, k
       =       -     equal (vectorises)
           µ   - monadic chain separation
            Ḋ  - dequeue (since the result will always start with an empty list)
             ’ - decrement (vectorises) (since the Natural numbers were used by œċ)

Jonathan Allan

Posted 2017-04-16T10:03:58.733

Reputation: 67 804

This is great. Thank you. – None – 2017-04-17T06:04:06.030

3

MATLAB, 272 bytes

function g(n,k);l=unique(sort(nchoosek(repmat(0:n-1,1,k),k),2),'rows');p=zeros(0,k);for i=1:size(l,1)p=[p;l(i,:)];a=0;for j=1:size(p,1)for m=1:size(p,1)b=0;for h=1:k if(p(j,h)==p(m,h))b=b+1;end;end;if(b<k-1)a=1;end;end;end;if(a)fprintf('\n');p=l(i,:);end;disp(l(i,:));end;

Output:

>> g(5,3)
 0     0     0

 0     0     1

 0     0     2

 0     0     3

 0     0     4


 0     1     1

 0     1     2

 0     1     3

 0     1     4


 0     2     2

 0     2     3

 0     2     4


 0     3     3

 0     3     4


 0     4     4


 1     1     1

 1     1     2

 1     1     3

 1     1     4


 1     2     2

 1     2     3

 1     2     4


 1     3     3

 1     3     4


 1     4     4


 2     2     2

 2     2     3

 2     2     4


 2     3     3

 2     3     4


 2     4     4


 3     3     3

 3     3     4


 3     4     4

 4     4     4

>> g(4,4)
 0     0     0     0

 0     0     0     1

 0     0     0     2

 0     0     0     3


 0     0     1     1

 0     0     1     2

 0     0     1     3


 0     0     2     2

 0     0     2     3


 0     0     3     3


 0     1     1     1

 0     1     1     2

 0     1     1     3


 0     1     2     2

 0     1     2     3


 0     1     3     3


 0     2     2     2

 0     2     2     3


 0     2     3     3

 0     3     3     3


 1     1     1     1

 1     1     1     2

 1     1     1     3


 1     1     2     2

 1     1     2     3


 1     1     3     3


 1     2     2     2

 1     2     2     3


 1     2     3     3

 1     3     3     3


 2     2     2     2

 2     2     2     3


 2     2     3     3

 2     3     3     3


 3     3     3     3

Two empty lines between different parts.

Ungolfed:

function g(n,k);
l=unique(sort(nchoosek(repmat(0:n-1,1,k),k),2),'rows');
p=zeros(0,k);
for i=1:size(l,1)
    p=[p;l(i,:)];
    a=0;
    for j=1:size(p,1)
        for m=1:size(p,1)
            b=0;
            for h=1:k
                if(p(j,h)==p(m,h))
                    b=b+1;
                end;
            end;
                if(b<k-1)
                    a=1;
                end;
        end;
    end;
    if(a)
        fprintf('\n');
        p=l(i,:);
    end;
    disp(l(i,:));
end;

Explanation:

First we find all the multisets with brute force:

l=unique(sort(nchoosek(repmat(0:n-1,1,k),k),2),'rows');

repmat(0:n-1, 1, k) repeats the vector of values from 0 to n-1 k times.

nchoosek(x, k) returns a matrix containing all k-combinations of the repeated vector.

sort(x, 2) sorts all the k-combinations, and then unique(x, 'rows') removes all duplicates.

p=zeros(0,k); creates an empty matrix with k columns. We'll use it as a stack. On each iteration of the outernmost for loop, we first add the current multiset to said stack: p=[p;l(i,:)];.

Then we check if the intersection of all multisets in the stack is at least k-1 long with the following code (we can't use MATLAB's intersect command to check for the intersections, because it returns a set, but we need a multiset):

a=0;
for j=1:size(p,1)
    for m=1:size(p,1)
        b=0;
        for h=1:k 
            if(p(j,h)==p(m,h))
                b=b+1;
            end;
        end;
        if(b<k-1)
            a=1;
        end;
    end;
end;

Now, if the intersection is long enough, a == 0, otherwise a == 1.

If the intersection isn't long enough, we print a newline and empty the stack:

if(a)
    fprintf('\n');
    p=l(i,:); % Only the current multiset will be left in the stack.
end;

Then we just print the current multiset:

disp(l(i,:));

Steadybox

Posted 2017-04-16T10:03:58.733

Reputation: 15 798

Looks like you cracked it! Could you explain your method? – None – 2017-04-16T14:17:37.433

@Lembik I added an explanation. – Steadybox – 2017-04-16T15:10:11.143

3

MATL, 34 bytes

vi:qiZ^!S!Xu!"@!&vt1&dXasq?0&Y)0cb

Parts are separated by a line containing whitespace.

Try it online!

Explanation

Disclaimer: this method seems to work (and it does in the test cases), but I don't have a proof that it always does

Multisets are sorted, both internally (i.e. each multiset has non-decreasing entries) and externally (i.e. multiset M comes before multiset N if M precedes N lexicographically).

To compute the multiset intersection, the sorted multisets are arranged as rows of a matrix and consecutive differences are computed along each column. If all columns except at most one have all differences equal to zero, the multisets belong to the same part.

This test would give a false negative result for multisets like (1,2,3) and (2,3,4): even if 2, 3 are common entries, they would not be detected as such because they are in non-matching columns.

However, this doesn't seem to be a problem, at least in the test cases. It appears that a test between multisets like 1,2,3 and 2,3,4 never actually has to be done, because some intermediate multiset gave a negative result and so they already are in different parts. If this is indeed true, the reason no doubt has to do with the fact that the multisets are sorted.

I don't have a proof of this, though. It just seems to work.

v           % Concatenate stack vertically: gives an empty array. This will
            % grow into the first part
i:q         % Input n. Push [0 1 ... n-1]
i           % Input k
Z^          % Cartesian power. Each Cartesian tuple is on a row
!S!         % Sort each row
Xu          % Unique rows. This gives all multisets, sorted, each on a row
!           % Transpose
"           % For each column
  @!        %   Push current multiset as a row
  &v        %   Vertically concatenate with the part so far
  t         %   Duplicate
  1&d       %   Consecutive differences along each column
  Xas       %   Number of columns that contain at least one non-zero entry
  q?        %   If that number is not 1 (this means that the current 
            %   multiset should begin a new part)
    0&Y)    %     Push last row, then the array with the remaining rows.
            %     Said array is a part, which we now know is complete
    0c      %     Push character 0. This will be shown as a line containing 
            %     a space. This is used as a separator between parts.
    b       %     Bubble up. This moves the loose row to the top. This row 
            %     is the beginning of a new part
            %   Implicitly end if
            % Implicitly end for
            % Implicitly display

Luis Mendo

Posted 2017-04-16T10:03:58.733

Reputation: 87 464

It's very impressive. – None – 2017-04-16T17:40:30.610

I'm trying to understand if the method you describe will always work. I see in that in the n=k=4 case we have a new part start at (0, 0, 3, 3), the vectorised consecutive difference of that and the previous multi-set, (0, 0, 2, 3) only has one difference, so how does the "part so far" make this work? (or equivalently what was the prior step result that got used instead of (0, 0, 2, 3)?) – Jonathan Allan – 2017-04-17T02:33:33.310

Ah, I see you perform a consecutive difference. Yes this should always work! You are literally looking for the points at which more than one item changes, but rather than multi-set intersection, simply vectorised intersection - which will work since the mutli-sets are sorted to start with. – Jonathan Allan – 2017-04-17T02:36:39.150

@JonathanAllan Yes, it's consecutive difference rather than intersection. I still don't see it clear that it will always work, but if you say so... :-) – Luis Mendo – 2017-04-17T09:38:51.113

1

PHP, 245 Bytes

for(;$i<($n=$argv[1])**$m=$argv[2];$i++){for($a=[],$v=$i;$v|count($a)<$m;$v=$v/$n^0)array_unshift($a,$v%$n);sort($a);in_array($a,$r)?:$r[]=$a;}foreach($r as$k=>$v)$k&&count(array_diff_assoc($x[$c][0],$v))<2?$x[$c][]=$v:$x[++$c][]=$v;print_r($x);

Try it online!

Expanded

for(;$i<($n=$argv[1])**$m=$argv[2];$i++){ # loop till $argv[1]**$argv[2]
    for($a=[],$v=$i;$v|count($a)<$m;$v=$v/$n^0) 
    array_unshift($a,$v%$n); # create base n array
    sort($a); #sort array
    in_array($a,$r)?:$r[]=$a; # if sorted array is not in result add it
}    
foreach($r as$k=>$v)
    $k&& # > first item and
    count(array_diff_assoc($x[$c][0],$v))<2 # if difference is only 1 item between actual item and first item in last storage item
    ?$x[$c][]=$v # add item in last storage array
    :$x[++$c][]=$v; # make a new last storage array
print_r($x); # Output as array

Output as String

foreach($x as$y){$p=[];
foreach($y as$z){$p[]=$o="(".join(",",$z).")";}
    echo join(", ",$p)."\n";
}

n>15 for more precision

for($i=0;$i<bcpow($argv[1],$argv[2]);$i=bcadd($i,1)){
    for($a=[],$v=$i;$v|count($a)<$argv[2];$v=bcdiv($v,$argv[1]))
    array_unshift($a,bcmod($v,$argv[1]));
    sort($a);
    in_array($a,$r)?:$r[]=$a;
}

Jörg Hülsermann

Posted 2017-04-16T10:03:58.733

Reputation: 13 026

This seems to work! But what do you mean by more precision? – None – 2017-04-17T06:01:49.070

@Lembik the short version gives back 0 for (16**16-1)%16 and the long version work only with the precision that is needed for n>15 bcmod(bcsub(bcpow(16,16),1),16) is 15 http://php.net/manual/en/ref.bc.php

– Jörg Hülsermann – 2017-04-17T11:36:33.133